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The  development  of  CD-ROM  technology  has  produced  significant  ramifications 
for  mass  storage  applications.  The  CD-ROM's  read-only  nature  and  its  ability  to  store 
over  500  megabytes  of  data  on  a  single  disc  will  eventually  revolutionize  the  historical 
and  archival  database  industries.  The  L’.S.  Navy  is  particularly  interested  in  the  space¬ 
saving  and  weight  reduction  capabilities  of  CD-ROM  as  compared  to  the  current 
magnetic  and  paper  media.  Adaptability  and  feasibility  are  the  primary  issues  to  be 
faced  when  considering  the  integration  of  CD-ROM  into  U.S.  Navy  applications.  This 
study  addresses  these  issues  and  determines  that  CD-ROM  will  play  a  significant  roie 
in  the  Navy's  efforts  to  create  a  "paperless  ship"  by  1990. 
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I.  INTRODUCTION 


A.  GENERAL  REMARKS 

CD-ROM*  -Compact  Disc  Read  Only  Memories)  provide  computer  software 
applications  developers  with  intriguing  possibilities  of  making  hundreds  of  megabytes, 
even  gigabytes  of  data  readily  accessible  to  personal  computer  users.  Such  massive 
storage  capacity  opens  up  new  realms  of  potential  applications  for  microcomputer- 
software  developers. 

The  CD-ROM  has  a  thousand  times  the  storage  capacity  of  a  floppy  disk.  In  the 
computer  industry,  we  often  improve  things  by  a  factor  of  two  or  three  and  the  new 
applications  are  considered  evolutionary.  But  a  one  thousandfold  increase  in  storage 
capacity  enabies  us  to  create  rich  and  multifaceted  new  applications.  (Gates.  I98<\  p. 

XI) 

Furthermore,  a  floppy  disk  can  store  only  a  few  seconds  of  full  motion,  full 
screen  color  video,  whereas  a  single  CD  can  store  as  much  as  an  hour  of  such  video 
images.  The  floppy  can  store  only  three  seconds  of  high-quality  audio,  but  the  CD  can 
store  an  hour.  It  is  this  remarkable  power  of  the  CD-ROM  disc  to  digitally  store  video 
images,  audio,  data,  and  computer  code  in  any  combination  that  emphasizes  its  vast 
potential. 

CD-ROM  technology  is  derived  from  CD  audio  technology  and  uses  the  same 
basic  drive  mechanisms  and  disc  manufacturing  processes.  Because  of  this  close 
relationship.  CD-ROM  player  and  disc  development  has  benefitted  directly  from  the 
technological  advances  and  cost  reductions  associated  with  the  rapid  growth  of  the  CD 
audio  industry.  (Einberger.  1987.  p.  31) 

B.  THE  TLOCD  SYSTEM 

Transaction  Ledger  on  Compact  Disc  (TLOCD)  is  the  culmination  of  a  U.S. 
Navy  supported  thesis  project  conducted  in  the  spring  of  1987  at  the  Naval 
Postgraduate  School  in  Monterey.  California.  It  involved  the  transfer  of  some 
3. "i it -.in -i i  records  containing  historical  transaction  data  from  a  magnetic  tape  medium 
to  a  CD-ROM  disc.  Tiie  records  represented  all  transactions  conducted  by  the  Nasal 
Supply  Center  at  Oakland.  California,  for  the  months  of  October  and  November  I  ‘No. 
The  records  were  arranged  into  three  types  of  files  according  to  their  particular 


g  ut'.u  issuing.  1  be  Cmsing  lance  i ;  1  c •-  contain  such  .u;  :i  .  - 

ty  on  hand  and  quantity  on  order.  The  'Audit  Trail"  dies  consist  of  pcrir.ent 
bout  previous  transactions. 

Reference  Technology  Inc.  of  Boulder.  Colorado,  was  tasked  with  transferring 
:u,  creating  the  indexes,  and  pressing  the  disc.  They  also  provided  the  system 
to  interface  between  IBM  compatible  personal  computers  and  the  C LAS IX 
:  Series  5b!)  disc  player  manufactured  by  Hitachi.  A  list  of  the  hardware  and 
.nttiailv  utilized  be  the  TLOCD  svstem  can  be  found  in  T  able  1. 


TABLE  1 

TLOCD  HARDWARE  AND  SOFTWARE  CONFIGURATION 

Zenith  Z-248  ?C  (IBM  PC/AT  Compatible)  with 

-20  Mbyte  Winchester  Drive 

-1  360K  Double-sided,  double-density 

-5  1/4  Inch  floppy  disk  drive 

-64  OK  RAM 

-Intel’s  80286  16-bit  Microprocessor 
-8  MHZ  Systems  Clock 

Zenith  RGB/ENHANCED  COLOR  MONITOR 

CLASIXtm  DataDr 1 ve tm  Series  500 


SOFTWARE 

Standard  File  Manager 
Key  Record  Manager 

Application  Specific  file  access  software 


Lind  Thesis,  n.  56. 


jn  n:  the  1  LOCI )  system  attempts  to  identity  an  alternative 
commitment  m A.  incut!;,  installed  TANDEM  systems  ..t  ’lie  vn 
item.  T  lie  s;. 'terns  are  saturated,  with  the  Transaction  Ledger  on  1) 
— tints  precludin''  the  system  from  being  utilized  lor  more  pro. Lac: 


tasks.  [  LOCI)  allows  the  user  to  query  data  in  much  the  same  way  as  the  l'l.Ol) 
'> 'tern.  The  only  .inference  is  m  the  ntore  eifeetive  CD-ROM  storage  medium  used  by 
I  l.OCD.  However,  the  user  never  actually  has  to  know  whether  the  data  is  stored  by 
conventional  means  or  whether  it  resides  on  a  CD-ROM. 

C.  OBJECTIVES 

Unless  the  die  structures  for  a  CD-ROM  application  are  designed  carefully,  the 
application's  performance  is  likely  to  suffer.  Typically,  poor  CD-ROM  performance  is 
the  result  of  file-structure  design  that  reflects  "magnetic-disk  think."  Application 
designers  often  tend  to  apply  rules  of  thumb  learned  from  working  with  magnetic- 
media.  Instead,  one  needs  to  locus  on  the  unique  strengths  and  weaknesses  of  the  CD- 
ROM.  'Zoellick.  TM6.  p.  1") 

it  :s  the  purpose  of  this  paper  to  examine  these  strengths  and  weaknesses  in  the 
areas  of  indexing,  die  management,  and  application  software  issues  and  to  nuke 
recommendations  to  be  considered  by  future  Navy  research  and  development  in  mass 
storage  applications.  Additionally,  the  feasibility  and  adaptability  of  CD-ROM 
technology  into  U.S.  Navy  environments  will  be  addressed.  The  TLOCD  prototype  will 
be  referenced  throughout  this  report. 


'iO'vAC'j.".  .A.*' "  j,\r  "  a  “A  *"*  , 


II.  CD-ROM  OVERVIEW 


A.  GENERAL  REMARKS 

CD-ROM  enjoys  tremendous  leverage  based  from  the  success  of  digital  audio. 
Both  products  use  the  same  12  centimeter  plastic  disc  lor  storing  data,  and  both 
employ  the  same  basic  manufacturing  and  playback  technologies.  CD-ROM  thus 
benefits  horn  the  volume-related  cost  savings  that  have  driven  down  the  prices  of 
digital  audio  and  made  it  so  popular  and  affordable. 

The  raw  specifications  of  CD-ROM  are  staggering.  A  single  4.72  inch  disc  stores 
55"  megabytes  of  d  the  equivalent  of  1.500  floppy  disks  or  28  20-megabyte  hard 
d>ks.  That  is  25"."'1"  pages-- '00  books— whole  encyclopedias.  Vet  any  piece  of 
information  on  the  disc  can  be  iocated  and  displayed  in  two  or  three  seconds.  (Defray, 
iuso.  p.  4j 

B.  PHYSICAL  FORMAT 

The  CD-ROM's  physical  format  is  defined  by  a  standard  developed  by  the 
Philips  and  Sony  corporations  and  is  an  extension  of  their  compact  digital  audio  disc 
standard.  However,  this  digital  audio  parentage  also  constrains  the  CD-ROM  to  an 
unimpressive  random-seek  performance.  In  particular,  the  underlying  digital  audio 
format  results  in  a  data  format  that  is  based  on  constant  linear  velocity  (CLV) 
recording. 

Most  magnetic  disks  use  constant  angular  velocity  (CAV)  format.  Figure  2.1 
shows  the  sector  organization  of  a  typical  magnetic  disk.  Note  that  the  sectors  on  the 
inner  tracks  are  smaller  than  those  on  the  outer  tracks.  This  is  because  CAV  is 
another  way  of  saying  constant  rotational  speed.  With  a  CAV  format,  the  linear 
velocity  of  the  disk  surface  relative  to  the  disk  head  is  greater  on  the  outer  tracks  where 
the  disk's  circumference  is  greater.  The  outer  sectors  are  also  physically  larger. 

1  ware  2.2  illustrates  the  CLV  sector  format  of  a  CD-ROM.  The  relative  speed  of 
a..'  vine  sunace  and  disc  head  stays  the  same,  even  as  the  head  moves  away  from  the 
or  of  the  disc.  A  <  D-ROM  drive  maintains  this  constant  linear  velocity  by  actually 
gn.g  the  disc's  rotational  spe  !  as  the  head  moves  from  track  to  track.  1  'he  CLV 
wnr.at  resuits  m  sectors  of  equal  leng’h.  The  actual  number  of  sectors  encountered  in  a 
su.gie  disc  rotation  ranges  from  about  nine  on  the  inside  of  the  disc  to  about  20  on  the 
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TRACK  0.  SECTOR  1- 
TRACK  1.  SECTOR  0 
TRACK  0.  SECTOR  0 


Source:  BYTE,  Mav  19S6. 


Figure  2.1  Sector  Organization  of  a  CAV  Magnetic  Disk. 

outer  edge.  Therefore,  recording  must  be  done  in  a  spiral  rather  than  in  a  scries  of 
concentric  rings.  Recording  begins  at  the  inside  of  the  disc  and  spirals  outward. 

The  great  advantage  that  CAV  recording  has  over  the  CD-ROM's  CLV  format  is 
that  the  CAV  organization  makes  it  easier  to  find  the  beginning  of  a  particular  sector. 
Suppose  one  wants  to  jump  to  a  specific  sector  relative  to  the  start  of  a  file.  With  a 
CAV  format,  where  each  track  contains  a  fixed  number  of  sectors,  it  is  very  easy  to 
translate  tiiis  relative  sector  number  into  an  absolute  track  and  sector  address,  given 
the  track  and  sector  address  of  the  start  of  the  file. 

There  is  no  simple,  fixed  relationship  between  a  CLV  track  and  the  number  of 
sectors  on  the  track.  Therefore,  translating  a  relative  sector  number  into  an  absolute 
track  and  sector  address  is  more  complicated.  In  addition,  head  movement  must  be 
accompanied  by  the  mechanical  process  of  speeding  up  or  slowing  down  the  rotational 
speed  of  tiie  disc.  Together  these  account  for  a  major  part  of  the  CD-ROM's  relatively 
poor  performance  in  locating  the  desired  track.  The  time  required  to  find  the  beginning 
of  a  particular  track  is  referred  to  as  seek  time. 


SECTOR  20- 
S  ECTOR  n- 
SFCTOR  & 


Source:  BYTE,  May  1986. 


Figure  2.2  Sector  Organization  of  a  CLV  CD-ROM  Disc. 

On  the  positive  side,  CLV  recording  makes  more  efficient  use  of  the  disc  surface. 
Rather  than  spreading  out  data  on  the  outer  tracks  as  on  a  CAV  disk,  the  CLV  format 
packs  the  data  on  the  outer  tracks  just  as  tightly  as  on  the  inner  tracks.  As  a 
consequence,  a  CLV  disc  can  hold  much  more  information  than  a  comparably  sized 
CAV  disk.  From  the  standpoint  of  audio  recording,  where  the  primary  mode  of  access 
is  sequential,  the  CLV  format  is  ideal.  It  packs  the  maximum  amount  of  music  on  a 
disc  without  exacting  a  performance  penalty.  However,  when  you  build  a  data  format 
cn  top  of  this  audio  format,  you  pay  for  increased  capacity  with  decreased  seek 
performance.  (Zoellick,  1986,  p.  ITS) 

C.  PHYSICAL  ADDRESSING 

The  CD-ROM's  CLV  format  rules  out  using  the  familiar  track  and  sector 
addressing  schemes  used  for  most  magnetic  disks.  Instead,  the  CD-ROM  uses  a 
scheme  that  can  be  traced  directly  to  its  audio  background.  Each  disc  is  said  to  have  60 
'minutes"  worth  of  data.  Each  minute  is  composed  of  60  seconds  and  each  second  is 
made  up  of  75  sectors.  A  single  sector  can  hold  2K  bytes  of  data.  Therefore,  the  entire 


...  ..l  ..k 


i 


dl-c  car.  hold  5-0.' "">I<  •  ('<*  \  <>')  \  ~5  \  l!'!)  bytes.  The  origin  of  the  disc  is  specified  as 
>:>) ' .:ero  minutes,  zero  seconds,  sector  zero). 

Application  developers  need  not  worry  about  the  physical  addressing  details  on 
CD-ROMs,  just  as  they  do  not  concern  themselves  with  such  details  on  magnetic 
media.  The  operating  system  will  convert  the  physical  view  into  a  logical  view,  allowing 
the  disk  to  be  regarded  as  a  collection  of  named  files  rather  than  a  collection  of  tracks 
and  sectors.  Laser-disc  operating  systems  provide  the  same  type  of  support  for  CD- 
ROMS. 

D.  PERFORMANCE  MEASUREMENT 

Good  CD-ROM  software  design  must  reflect  an  awareness  of  the  CD-ROM's 
weaknesses,  in  particular  its  poor  seek  performance.  Table  2  compares  a  typical  CD- 
ROM  drive  with  two  different  types  of  magnetic-disk  drives.  The  comparisons  include 
capacity,  seek  performance,  and  data- streaming  performance  during  a  series  of 
sequential  reads  of  contiguous  data.  The  sequential-read  performance  on  the  magnetic 
disk  assumes  an  interleave  factor  of  five,  meaning  that  it  takes  five  disk  revolutions  to 
read  all  the  data  in  a  given  track. 

An  average  seek  on  a  full  CD-ROM  takes  five  times  as  long  as  on  a  10-megabyte 
hard  disk.  When  compared  to  a  high-performance  magnetic  disk,  there  is  more  than  an 
order  of  magnitude  of  difference  in  the  seek  performance.  When  designing  software  for 
a  magnetic  disk,  a  major  effort  to  avoid  seeks  should  be  made.  Given  the  cost  of  seeks 
on  a  CD-ROM.  even  more  stringent  measures  should  be  taken  to  avoid  an  average 
seek.  (Zoellic  Bill.  1986.  p.  ISO) 

However.  Table  2  demonstrates  that  the  cost  of  a  short  seek  covering  only  a  few 
tracks  is  relatively  small.  This  is  because  the  CD-ROM  only  needs  to  move  the  mirror 
used  to  position  the  laser  beam  on  the  disc.  It  does  not  have  to  move  the  sled 
containing  the  mirror,  lenses,  and  other  parts  of  the  disc-reading  mechanism.  Instead, 
the  laser  bounces  a  pinpoint  of  light  off  the  CD-ROM's  surface,  which  consists  of  a 
pattern  of  submicroscopic  pits.  This  information  is  converted  into  a  digital  signal  and 
read  by  an  optical  disc  drive. 

’1  his  disparity  between  the  cost  of  a  short,  local  seek  and  a  longer  one  is  of 
signifivant  import  ..nee.  It  means  that  every  opportunity  should  be  taken  to  minimize 
the  physical  distance  between  parts  of  a  file  to  be  used  in  succession.  Since  the  CD- 
ROM's  sequential-read  performance  as  shown  in  fable  2  is  very  respectable,  reading  a 
large  block  of  data  does  not  cost  that  much  more  than  reading  a  short  one.  The 
primary  cost  is  in  locating  or  finding  the  block. 


E.  CD-ROM  BENEFITS 

The  CD-ROM's  adequate  sequential-read  performance  and  its  ability  to  rapidly 
seek  over  the  range  of  a  few  tracks  arc  important  to  the  design  of  good  software.  Its 
most  beneficial  characteristic  is  that  it  is  a  read-only  medium.  It  is  nonerasable.  For 
applications  demanding  secure  storage  of  original  versions  of  valuable  documents, 
images,  or  data  streams,  the  primary  advantage  of  noncrasibilitv  is  evident:  once  the 
data  arc  recorded,  nobody  can  modify  or  erase  them  short  of  physically  destroying  the 
media.  (Moore.  F1S4,  p.  72) 

T  wo  other  benefits  arise  from  the  fact  that  a  CD-ROM  has  a  read-only  nature. 
First  of  all,  there  are  never  any  concerns  with  insertions,  deletions,  or  modifications. 
T  herefore,  when  budding  a  tree,  the  most  frequently  used  records  can  be  placed  in  the 
nodes  nearest  the  roots  because  they  are  never  going  to  change.  Secondly,  the  costs  of 
wiiting  and  reading  arc  not  equally  balanced.  A  CD-ROM  is  written  only  once  but  is 
read  over  and  over  again.  Therefore,  more  time  and  effort  should  be  put  into  the  initial 
construction  of  files  and  indexes  in  order  to  obtain  the  fastest  retrieval  possible. 
Furthermore,  building  the  file  and  index  structures  is  often  done  on  a  larger  machine, 
while  the  -ctricval  is  most  likely  to  be  done  on  a  micro.  If  expensive  tasks  such  as 
:  ,1  nnuivsis  and  text  formatting  are  necessary,  it  is  better  to  do  them  once  with  the 


TABLE  3 

ADVANTAGES  OF  CD-ROM 


•  PERMANENT/DURABLE:  It  is  an  excellent  archival  medium  (currently  Sony 
disks  are  guaranteed  for  50  years.)  Also  very  rugged  and  able  to  withstand 
adverse  weather  and  handling  conditions. 

•  NON-VOL1TATILE:  No  loss  or  altering  of  data  during  power  failure  or  surges. 

•  LOW  COST:  The  'per  MS'  cost  of  data  is  less  than  any  storage  medium. 

•  EXTREMELY  PORTABLE:  The  media  is  remov  able  and  offers  portability  of 
data. 


•  SECURITY:  Physical  control  can  be  maintained  easilv  and  thus  large 
quantities  of  sensitive  data  can  be  controlled.  Also,  the  possiblity  exists  to 
manufacture  the  disk  out  of  glass  instead  of  polycarbonate  material  and  thus, 
for  military  purposes  emergency  destruction  could  be  easily  accomplished. 

•  SMALL  PHYSICAL  VOLUME/WEIGHT:  Easily  carried,  or  mailed  etc,  at  a  very 
reasonable  expense. 

•  NOT  ABLE  TO  BE  ALTERED:  This  media  is  Read  Only  Memory  (ROM)  and  as 
such,  it  is  extremely  useful  for  audit  trails  in  the  legal  and  financial  world 
where  magnetic  media  have  not  been  allowed  as  evidence  due  to  the 
alterability  of  that  media. 

•  ENORMOUS  DATA  STORAGE  CAPABILITY:  Up  to  600  MB  of  data  on  a  single 
side  of  a  single  disk  which  is  only  4.72  inches  in  diameter. 

•  USER  FAMILIARITY:  It  is  simply  another  PC  peripheral  that,  to  the  user, 
looks  just  like  a  read  only  MS-DOS  etc.  disk.  Also,  the  average  user  has  had 
experience  with  the  same  physical  disk  in  the  CD-Audio  environment  and 
therefore  feels  more  comfortable  with  it  all  ready. 

•  BACKUP  IS  ELIMINATED:  There  is  no  need  to  backup  the  disk  because  it  is 
ROM.  For  safety  sake,  mulitiple  copies  can  be  ordered  at  the  time  of  disk 
pressing  and  stored  in  separate  locations. 

•  ELECi  RO-MAGNETIC  PULSE  (EMP)  HAS  NO  EFFECT:  This  is  not  a  magnetic 
media  and  therefore  any  sort  of  electro-magnetic  energy  has  no  effect  on  it. 

•  NO  HEAD-CRASHES:  The  read-device  is  optical  and  does  not  contact  the 
disk  in  any  way,  therefore,  head-crashes  are  virtually  eliminated. 


’nurce:  I.  rod  Themis,  p.  26, 
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III.  CD-ROM  APPLICATIONS 

A.  GENERAL  REMARKS 

The  basic  technology  for  read-only  optical  discs  was  developed  to  distribute 
movies  and  high-fidelity  music.  Consumer  electronics  companies  spent  hundreds  of 
millions  of  dollars  over  the  past  decade  in  Europe.  Japan,  and  the  United  States  to 
make  the  videodisc  and  audiodisc  inexpensive,  reliable,  and  long  lasting.  As  a  result, 
data  distribution  on  CD-ROMs  was  a  natural  and  direct  extension  of  the  basic 
technology.  iHenscl.  19S6.  p.  4S7) 

Information  users  who  have  access  to  a  microcomputer  and  optical  disc  player 
are  new  able  to  access  entire  collections  of  databases  that  have  been  placed  on  CD- 
ROM.  Tiie  resulting  savings  are  significant.  Even  if  there  is  no  other  reason  for  buying 
the  microcomputer  and  disc  player,  they  pay  for  themselves  with  a  few  hours  of 
activity  per  week  when  the  alternative  is  online  connect  charges.  However,  much 
greater  savings  are  possible.  The  Internal  Revenue  Service  has  begun  a  project  entitled 
"File  Archival  Image  Storage  and  Retrieval"  which  it  estimates  will  save  as  mucli  as 
536  million  annually  in  storage  costs.  (Contract.  1986.  p.  18) 

B.  LIBRARY  APPLICATIONS 

CD-ROM  library  applications  are  essentially  of  two  types.  On  the  one  hand  they 
are  designed  as  support  tools  for  library  automation  activities,  including  traditional 
book  cataloging  and  local  public  access  catalogs.  On  the  other  hand,  they  provide 
inexpensive  around-the-clock  availability  of  databases  previously  produced  in  paper 
format.  f.Melin.  19S7.  p.  509) 

A  critical  problem  often  laced  by  librarians  is  the  growth  of  their  collections, 
especially  the  periodical  and  resource  indexes.  Increasing  volumes  of  new  data,  in  both 
print  and  microform,  have  meant  that  increased  space  is  needed  to  house  them.  The 
acuity  of  CD-ROM  to  store  hundreds  of  thousands  of  pages  in  a  limited  space  is  very 
appealing  for  this  very  reason.  The  medium  is  practically  indestructible.  Not  only  can 
do/ens  of  books  be  stored  on  disc,  but  rare  and  fragile  documents,  never  before  made 
,r.  ailable  to  the  public,  can  also  be  stored  in  their  original  form  without  concern  that 
they  wiil  be  damaged  or  destroyed  by  patrons. 
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(5 roiicr  Encyclopedia  lias  already  produced  a  version  of  the  Academic  . bnericiin 
on  optical  disc.  Abo.  the  Library  of  Congress  is  currently  conducting  a 
special  optical  disc  pilot  program  that  includes  rapid  high-resolution  scanning,  storage 
and  retrieval  of  images  of  journal  titles,  law  materials,  manuscripts,  sheet  music,  maps, 
and  technical  reports.  The  British  Lib  ran.'  is  experimenting  with  the  development  of 
bibliographic  files  on  CD-ROM. 

Moreover.  Software  Mart.  Inc.  (SMI)  has  developed  an  illustrative  dictionary 
with  voice  annotation  on  CD-ROM.  It  is  called  The  Visual  Dictionary  and  could  propel 
illustrated  consumer  dictionaries  into  foreign  language  training  vehicles.  (Kuhn.  19b>", 


C.  MEDICAL  AND  LEGAL  APPLICATIONS 

It  can  be  argued  that  where  knowledge  is  concise,  it  should  be  delivered  in  a 
concise  way.  This  is  particularly  applicable  to  clinical,  action-oriented  knowledge, 
if  hunting.  I9S6.  p.  529)  Micromedex.  Inc.  has  applied  this  approach  with  considerable 
success  and  has  produced  the  first  medical  information  product  to  actually  achieve 
commercial  successful  distribution  with  their  "Computerized  Clinical  Information 
System"  (CCIS).  The  application  utilizes  highly  structured  menus  that  combine  easily 
understood  screen  displays  to  bring  clinical  management  protocols  into  the  emergency 
room  with  remarkable  speed  and  precision.  This  design  is  successful  because  it 
recognizes  that  the  emergency  room  physician  or  poison  center  technician  is  not 
working  in  a  contemplative  environment  when  he  or  she  has  need  for  the  product.  On 
the  contrary,  there  arc  a  multitude  of  distractions,  perhaps  even  a  life  hanging  in  the 
balance.  Consequently,  the  information  must  be  delivered  concisely  and  accurately 
with  no  time  for  discussion  or  debate.  (Huntting.  19S6,  p.  531) 

The  world-wide  use  of  CD-ROM  in  the  medical  and  health  fields  continues  to 
grow.  The  Canadian  Center  of  Occupational  Health  and  Safety  has  incorporated  the 
largest  publicly  available  chemical  database  onto  a  CD-ROM  and  has  included  it  in  its 
e [Torts  to  improve  data  distribution  and  employee  safety  programs.  (Abcytunga.  19S7. 

p.  1  i 

Attorneys  and  tax  accountants  must  review  a  tremendous  amount  of  reference 
material  that  may  be  relevant  to  their  clients'  legal  or  tax  needs.  Equipped  with  an 
enure  electronic  library  at  their  fingertips,  attorneys  and  tax  accountants  arc  sure  to 
find  it  easier  to  track  down  and  review  material  and  thus  improve  their  ability  to  serve 
their  clients.  CD-ROM  is  an  ideal  medium  for  many  legal  applications  dealing  with 
taxes,  statutes,  ease  histories,  legal  forms,  and  patents. 
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D.  CARTOGRAPHY  APPLICATIONS 
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One  CD-ROM  can  store  a  complete  digital  map  of  every  street  in  New  England 
plus  additional  information  equivalent  to  3<i<>  unabridged  copies  of  Moby  Dick.  The 
basic  map  information,  judiciously  compressed,  amounts  to  120  to  150  bytes  per  street. 
Since  hi  percent  of  the  U.S.  population  lives  on  about  one  million  streets  represented 
in  the  Census  Bureau's  1  lies,  a  simple  extrapolation  allowing  for  rural  streets  that 
wiggie  more  than  their  urban  counterparts,  yields  a  nationwide  digital  map  that  will  lit 
on  a  single  CD-ROM.  (Cooke.  19S6,  p.  560) 

It  would  be  more  appropriate  to  publish  regional  or  state  discs  supplemented 
with  a  wealth  of  information  targeted  for  specific  markets.  The  business  edition,  for 
example,  would  contain  a  list  of  all  companies  in  the  region  indexed  by  both  industrial 
classification  and  geographic  location.  The  family  edition  would  have  data  about 
restaurants,  tourist  attractions,  shopping  centers,  stores,  and  museums. 

DeLorme  Mapping  Systems  of  Freeport.  Maine,  has  stored  Delorme's  World 
Alios  on  CD-ROM.  Also,  the  Compaq  Deskpro  386  displays  maps  of  the  entire  earth 
from  one  laser  disc  in  conjunction  with  a  personal  computer  (Vizachero.  19S6,  p.  58). 

LaserPlot.  Inc.  has  produced  the  first  CD-ROM-based  position  tracking  system 
for  marine  navigation.  It  displays  full-color,  digitized  National  Oceanic  and 
Atmospheric  Administration  (NOAA)  charts  in  various  scales  (Belanger,  1987.  p.  13). 
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E.  U.S.  NAVY  APPLICATIONS 

Current  investigation  into  the  interests  of  CD-ROM  technology  in  the  U.S.  Navy- 
revealed  a  NAVSEA  sponsored  project  entitled  "Computer-Aided  Technical 
Information  System"  (CATIS).  CAT1S  is  primarily  involved  with  the  placing  of 
engineering  technical  manuals  for  the  Trident-Class  submarines  onto  CD-ROM  discs. 

Further  investigation  discovered  an  ongoing  project  at  the  Naval  Ship  Weapons 
System  Engineering  Station  (NSWSES)  in  Port  Itueneme.  California.  The  project  has 
been  tabbed  "  Engineering  Data  Management  Information  and  Control  System" 
iEDMICS)  and  is  involved  with  placing  engineering  diagrams  onto  CD-ROMs  for  use 
by  major  industrial  facilities.  (Lind.  198”.  p.  69 1 

Image  Conversion  Technologies  has  been  awarded  a  S2.5  million  contract  for 
image  management  services  lor  the  "Naval  Print  on  Demand"  system.  ICT  will  digitize 
about  l.S  million  pages  of  military  specifications  to  be  stored  on  two  S< '-gigabyte 
optical  disc  library  units.  Id'  s  management  system  will  be  used  for  storage,  indexing, 
and  retrieval  of  ail  documents  to  he  printed,  while  its  order-entry  system  will  be  used  to 


manage  orders  ana  per;  arm  administrative  operations.  The  anticipated  printing  volume 
is  22f.'»i)i>  pages  per  day  with  a  required  turn-around  time  of  two  Jays.  (Lind.  19S".  p. 
oh 

The  Navy  is  also  conducting  research  on  CD-ROM  technology  at  the  Naval 
Postgraduate  School  in  Monterey,  California.  The  thrust  of  this  research  is  concerned 
with  the  adaptability  of  systems  such  as  the  TLOCD  prototype  addressed  in  the 
introduction  of  this  paper. 
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IV.  THE  TYPICAL  CD-ROM  DATABASE 


A.  DATA  FILES 

1.  Data  Records 

The  purpose  of  any  database  is  to  provide  access  to  its  data  records,  The  data 
records  in  a  CD-ROM  database  can  be  of  either  fixed  length  or  variable  length.  The 
maximum  si/e  of  a  CD-ROM  record  is  2.147.483.047  bytes,  but  there  must  be  a 
memory  buffer  large  enough  for  the  largest  record  to  be  read. 

2.  Data  Records  and  Keys 

Keys  are  fixed-length  byte  strings  which  are  organized  into  indexes  to  provide 
access  to  the  data  records.  Keys  do  not  have  to  be  physically  contained  in  the  data 
records  and  the  structure  of  the  records  need  only  be  known  to  the  application 
program.  However,  if  the  keys  are  contained  in  the  records  at  fixed  offsets  from  their 
beginning  then  this  information  can  be  stored  in  the  index  headers,  thus  allowing  them 
to  be  accessed  by  application  programs. 

3.  Data  Records  and  Indexes 

Data  record  keys  are  arranged  into  indexes.  Indexing  makes  it  seem  that  the 
records  of  a  data  file  are  arranged  in  the  order  of  the  keys  for  that  particular  index. 
Because  multiple  indexes  can  be  supported,  there  may  be  as  many  orders  to  the  records 
as  there  arc  indexes. 

4.  Physical  and  Logical  Data  Files 

Tiles  of  data  records  are  provided  by  the  information  publisher.  For  example, 
the  Naval  Supply  Center  in  Oakland  provided  Reference  Technology  with  the  data 
records  required  for  the  TLOCD  project.  The  TLOCD  application  can  handle  up  to  32 
hies,  which  is  the  limit  imposed  by  the  Reference  Technology  file  management  system. 
These  files  can  be  placed  on  either  optical  or  magnetic  devices  or  both.  All  the  physical 
files  ire  logically  concatenated  to  form  a  single  logical  data  file,  and  the  offsets  in  the 
indexes  refer  to  onsets  from  the  beginning  of  tins  logical  !iic.  A  limited  update 
capabiktv  exm  be  supported  with  multiple  data  files  by  logically  appending  new  data 
flies  to  existing  data  tiles  and  creating  new  indexes  for  the  resulting  logical  data  file. 

■  K  .  T'M\  p.  1 - 1 
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3.  rreating  Kev  Record  Files 
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ovd  liics  can  he  created  bv  the  CD-ROM  manufacturer  or  bv  tire  date 


publisher.  The  decision  should  be  based  on  the  structure  of  the  data  records.  If  the  key 
•is  in  a  Fixed  location  in  a  data  recc'd,  the  key  records  can  be  generated  automatical';/ 
ny  the  disc  manufacturer.  Otherwise,  the  key  records  must  be  provided  by  the 
publisher  in  the  format  as  described  in  Figure  4.1. 

C.  INDEX  FILES 

1.  Indexes 

Indexes  are  created  by  putting  sorted  key  records  into  an  index.  Each  key 
index  provides  access  to  the  data  records  in  the  order  of  the  key  records  that  compose 
Key  records  for  an  index  may  be  arranged  in  either  ascending  or  descending  order. 
E..c;t  index  is  assigned  an  integer  identifier,  beginning  with  zero,  which,  is  always  the 
data  index.  Subsequent  key  indexes  arc  assigned  integers  beginning  with  one. 

The  key  records  in  the  data  index  contain  only  the  byte  olFsets  of  the  data 
records  in  the  logical  data  file.  Since  the  data  index  is  keyed  by  the  record  offsets,  it 
provides  sequential  access  to  the  records  in  the  order  they  were  received  by  the 
manufacturer.  The  data  index  for  databases  with  records  of  fixed  length  is  normally  a 
virtual  index.  For  databases  with  records  of  variable  length,  a  baianccd-trce  index 
containing  the  record  offsets  is  created.  T  his  makes  it  possible  to  find  a  record  either  by 
sequential  position  in  the  sequence  of  data  records,  or  by  byte  offset  in  the  logical  data 
'ile 

The  maximum  number  of  indexes  to  a  Reference  Technology  database  is 
2. 14~.4S3.647.  However,  the  number  of  indexes  which  can  be  accessed  at  one  time  is 
limited  by  available  memory  allocation.  Each  open  index  in  the  database  requires 
memory  for  an  Index  Control  Block  (S9  bytes,  plus  12  bytes  lor  each  level  of  index) 
,md  for  a  key  record  buffer.  Assuming  two-level  indexes  and  32-byte  key  records,  an 
IBM  PC'  with  3S4  Kbytes  of  available  memory  could  support  2711  open  indexes.  iKey, 
!••  <\  p.  I'M 

2.  Hash  Table  Indexes 

Vfef-deCgueJ  hush  tables  support  exact-match  key  searches  with  at  most  one 
■  me  w-c"  P  .s;*.i  mug  by  key  order  will  require  at  most  two  disc  accesses.  Partial- 
:..,:_h  -e.iui.es  are  supported,  'nut  will  require  approximately  twice  as  many  seeks  as 
the  logarithm  base  two  of  the  number  of  index  pages  in  tiie  hash  table.  (Key.  1986.  p. 
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'i'.  tamo  are  extended  to  include  a  kev  order  record 


lumber.  A  erc-s>-re:'erencc  tabic  is  appended  to  the  hadi  table  to  allow  positioning  by 
xex  order  with  the  overhead  of  a  single  additional  disc  access,  and  thereby  allowing  a 
einary  search  of  the  hash  table  for  partial  matches. 

3.  Balanced  Tree  Indexes 

A  balanced  tree  for  each  index  is  produced  by  placing  key  records  in  fixed- 
length  'index  pages,  which  are  arranged  in  a  tree  so  that  examining  the  records  in  a 
page  of  the  tree  at  one  level  tells  which  page  to  examine  at  the  next  lower  level.  Since 
there  is  only  one  page  at  the  top  level,  only  one  page  on  each  level  needs  to  be 
examined  to  locate  a  specified  key. 

D.  CONFIGURATION  FILES 

A  configuration  die  contains  the  the  specifications  >  the  complete  volume,  path, 
and  namet  of  each  of  the  data  files  and  index  files  that  make  up  a  database.  Its 
function  is  to  map  the  logical  correspondences  between  index  identifiers  and  the 
physical  indexes.  Performance  considerations  may  request  certain  index  files  to  be 
copied  to  a  magnetic  device.  For  this  reason,  a  configuration  file  contains  only 
printable  ASCII  characters.  This  allows  the  use  of  a  text  editor  to  modify  the  volumes 
or  paths  in  a  magnetic  copy  of  a  configuration  file.  (Key.  1986.  p.  24) 
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V.  KEY  RECORD  UTILIZATION 


A.  KEY  RECORD  MANAGER 

AYy  Record  Manager  is  a  software  access  program  for  files  with  structured  fields 
and  records.  It  was  designed  by  Reference  Technology  primarily  as  a  tool  to  be  used  in 
conjunction  with  CD-ROM  databases.  It  provides  an  Indexed  Sequential  Access 
Method  (ISAM)  comparable  to  mainframe  retrieval  systems  for  record-oriented 
databases.  The  Key  Record  Manager  allows  for  two  index  structures,  a  balanced  tree 
and  a  hash  table.  The  Key  Record  Manager  software  is  implemented  as  a  library  of  C 
language  functions  that  can  be  linked  to  application  programs  which  require  access  to 
< t : p p e r t e d  da t a b uses. 

B.  SAMPLE  DATABASE 

CD-ROM  databases  normally  consist  of  large  files,  each  organized  into  similarly 
structured  data  records  which  are  divided  into  fields.  The  data  record  fields  consist  of 
key  fields  which  are  indexed  and  data  fields  which  are  not.  The  easiest  way  to 
conceptualize  such  a  database  is  in  two  dimensions.  A  data  record,  the  individual  entry 
for  a  database,  is  the  row;  the  field  is  part  of  a  column  of  similar  information  for  each 
of  the  rows. 

Figure  5.1  is  an  example  of  a  simplified,  fictitious  stock  market  database.  It  was 
reproduced  from  Reference  Technology's  Key  Record  Manager  and  will  be  referred  to 
throughout  the  remainder  of  this  chapter.  The  data  records  in  this  example  are  of 
variable  length  and  are  arranged  in  the  alphabetical  order  of  their  ticker  tape  symbols. 

The  offset  field  refers  to  the  olTset  of  the  record  from  the  beginning  of  the  data 
file.  It  is  not  usually  represented  within  the  record  but  is  implicit  in  the  ordering  of  the 
records  within  the  file.  The  comment  field  is  text  which  is  not  shown  completely 
because  it  varies  in  length  for  each  company. 

(\  USING  KEYS  TO  BUILD  KEY  RECORDS 

There  must  be  a  sorted  file  of  key  records  in  order  to  construct  indexes.  It  should 
be  placed  in  a  hash  table  or  tree  for  quick  access.  The  key  fields  of  the  records  are 
used  to  create  key  records  which  contain  a  copy  of'  the  key  field  and  the  offset  of  the 
record  associated  with  that  particular  key  field  in  the  data  file.  Figure  5.2  shows  a  key 
record  generated  from  the  Dividend  field  in  one  of  the  data  records. 
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Source:  Key  Record  Manager,  p.  6. 


Figure  5.1  Sample  Stock  Market  Database. 


I  SING  KFY  RECORDS  TO  CREATE  INDEXES 

The  indexes  can  be  constructed  once  all  the  key  records  have  been  created  from 
raki'-e  keys.  A  complete  database  would  contain  indexes  for  all  the  data  record 
1  he  indexes  are  in  turn  placed  in  index  files  and  arc  used  to  access  the  data 
A  themselves.  The  indexes  could  till  be  placed  in  one  file  or  they  could  be  placed 
'uate  hies.  Figure  5.3  contains  all  the  indexes  generated  for  the  key  fields  in  the 
o  d  ’.'.'base.  Note  that  some  of  the  fields  such  as  Exchange,  Date,  and  Comment 
key  fields  and  therefore  cannot  be  searched. 

SEARCHING  INDEXES 

Indexes  arc  a  space-saving  device  because  they  arc  made  up  of  key  records  rather 
vi.ole  data  records.  Only  one  set  of  data  records  need  be  masteicd  onto  a  <’D- 
disc,  with  access  to  the  single  copy  of  the  data  rccoids  being  made  available  in  a 
a;:  order  depending  on  which  index  is  utilized.  This  requires  much  less  space  than 
g  the  data  records  on  the  disc  in  different  places  for  different  soit  sequences. 

I  he  data  iccoids  on  a  CD-ROM  have  the  sequence  shown  by  tlwir  offsets  and 
wax «  reram  that  order  in  the  data  file.  However,  the  indexes  to  the  data  records 


Data  Record: 

Ulfse!  Synihul  Name  E\c.  SiC  Price  Earning1'  Div.  Dale  Comment 


K'K)7  EBR  EBanks  O  6776 


5.22  1.60  3/1/86  Kegnmal  hanking 


Kev  Record: 


Manager,  p.  7. 


Figure  5.2  Key  Record  Generation. 

:  the  order  of  their  keys  which  have  previously  been  sorted.  Therefore,  creating 
xes  for  the  key  fields  makes  it  seem  as  if  the  data  records  are  arranged  in  a  series  of 
•rent  orders,  one  for  each  index  used  to  access  them.  In  our  example,  the  data  index 
lex  i>)  is  used  to  access  the  records  in  their  original  order.  Figure  5.4  shows  the 
,'r  of  tiro  records  when  indexed  by  Name  (Index  2)  and  when  indexed  by  Price 


Cmweptually,  tire  search  for  a  matching  key  is  accomplished  by  beginning  at  one 
i  ev  sequence  and  searching  the  keys  sequentially  towaids  tire  nthei  end  uni:! 
'■•'..h  or  close  maiv.ii  is  found.  For  ascending  searches,  the  first  key  equal  to  or 
; : e r  titan  the  desired  key  will  be  retrieved.  For  descending  searches,  the  first  key 
a  to  ;•:•  f'ss  titan  the  desired  key  will  be  retrieved.  Thus  one  could  search  the  Name 
:x  for  "T  ;b“  and  retrieve  "Tobacco"  if  the  search  is  ascending,  or  retrieve  "  1  axis  o  '  n 
ve::rc!r  is  descending.  In  reality  it  is  not  a  sequential  search  but  is  acttnPv  a 
: tree  traversal  or  hash  table  look-up.  Care  should  always  be  tak'u  to  •!•:•.  u 
:  structures  so  tiutt  the  number  of  comparisons  and  accesses  can  be  mmimi/ed. 
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Figure  5.3  Kev  Created  Indexes. 
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The  records  in  the  example,  wnen  accessed 

r>\  Name  i  Index  2i  w-ould  appear  to  uc  ordered  as  lollows: 
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It  accessed  by  Price  (Index  4 1.  the  apparent  order  of  the  data  records  would  be: 


Ollsel 

Symbol 

Name 

txc. 

SIC 

Price 

Earnings 

Div 

Date 

Comment 

566' s 

DR1 

Realesi 

N 

6.444 

1 

(5-44) 

Poor  investments 

2904  | 

CAR 

Meat. Iik 

t ) 

0128 

8 

.  .45 

72 

5  186 

Meat  products  lor 

1 10X44 

FIN 

Fin  rum. 

o 

67  7fo 

15 

1.86 

1  .(X) 

1  15.86 

Suspending  the  di 

10522 

BAC 

TonacO 

-\ 

0144 

15 

(.71) 

Chewing  tobacco 

4104  5 

DDF 

Dicet' 

(  ) 

5770 

1  7 

1.21 

Inis  last-growin 

0 

ABC 

A  g  B  u  sC 

\ 

01  12 

77 

5.50 

1.60 

5  51/86 

Corp.  larming  is 

1015'! 

FAST 

Clock' 

\ 

4470 

77 

2.11 

.72 

4  1/86 

Despite  its  name 

7 ,44  1  d 

DST 

I  /cnM.m. 

'  l 

5057 

.44 

1.21 

Designer  jeans,  i 

SKI  II 1 7 

e:br 

IlhahK ' 

(  ) 

6776 

54 

5.22 

1  60 

4  1  Kti 

Regional  banking 

2  1 

CAB 

laxic- ' 

X 

457' 

42 

6  57 

44 

1  15  86 

I  nis  taxi  compan 

Gourca:  Key  Record  Manager,  pp.  9-10. 


Figure  5.4  Searching  On  Specified  Indexes. 


r.  KEY  RECORDS  FOR  SPECIAL  PURPOSES 
L  Partial  Keying  of  Data  Records 

Index  performance  is  generally  better  when  smaller  key  records  arc  involved. 
This  is  especially  true  for  balanced  trees  where  key  records  may  result  in  additional  tree 
levels  and  therefore  cause  additional  disc  accesses.  Index  size  can  be  greatly  reduced  in 
some  eases  if  some  data  records  are  not  keyed  on  every  index.  Since  the  Symbol  index 
in  our  example  database  is  in  the  same  order  as  the  data  records  it  becomes  possible  to 
key  only  the  first  record  in  each  CD-ROM  sector.  Then  a  partial  match  search  in  the 
much  smaller  resulting  index  could  be  followed  with  an  exact  match  search  in  the  data 
records  themselves.  Index  size  can  also  be  reduced  by  not  indexing  records  on  key- 
fields  that  are  blank. 

2.  K.  Records  With  Extra  Information 

Key  records  may  contain  additional  information  besides  the  key  and  oiiset 
fields.  Figure  5.5  displays  such  a  record.  A  length  field  may  be  included  for  variable- 
length  records.  However,  it  is  not  essential  because  the  length  of  the  data  record  could 
be  determined  by  finding  the  offset  of  the  next  data  record  and  subtracting,  but  this 
would  require  an  extra  access  to  the  data  index  (Index  0). 
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Key  Number  Key 
i  hashing  only ) 


uusn  tames  are  usee,  a  Key  numoer  is  required  oecause  the  record  entries  :n 
a  hash  table  are  not  arranged  by  the  order  cl'  their  keys.  Hash  table  keys  are 
distributed  randomly  across  index  pages  and  are  only  sorted  within  a  page.  The  keys  in 
a  balanced-tree  are  arranged  in  a  fully  sorted  pattern  and  therefore  do  not  need  a  key 
number. 

One  option  which  can  affect  application  performance  and  disc  overhead  is  that 
key  records  can  also  contain  extra  or  optional  data  for  use  only  by  the  application 
program.  Once  a  key  record  is  located  within  an  index,  the  optional  data  can  be  read 
immediately  from  the  key  record  and  thus  save  an  access  to  the  data  file.  Appending 
extra  data  to  keys  makes  retrieval  of  that  data  very  quick,  once  the  key  is  located.  This 
is  obtained  at  the  expense  of  a  larger  index  which  would  require  a  longer  seek. 
However,  a  second  seek  to  locate  the  additional  data  is  no  longer  necessary. 

3.  Overlapping  Iveys 

Another  area  in  which  key  record  design  can  affect  application  performance  is 
the  overlapping  of  key  Helds  by  other  key  fields.  For  example,  it  might  be  desirable  to 
allow  a  date  field  (Year-Month-Day)  to  be  searchable  by  various  overlapping  keys  as 
seen  in  Figure  5.6.  This  overlapped  set  of  keys  could  be  used  to  search  on  Year- 
Month-Day  (Key  1),  Month-Day  (Key  2).  and  Day  (Key  3)  information.  By  searching 
for  partial  matches  Key  l  could  also  be  used  to  search  on  Year-Month  or  Year,  and 
Key  2  could  be  used  to  search  on  Month.  The  same  searches  could  be  performed  with 
separate  Year.  Month,  and  Day  fields,  but  this  would  mean  searching  in  three  separate 
indexes  for  a  Year-Month-Day  specification,  with  much  worse  than  triple  the  access 
time  for  this  index.  (Key.  19S6.  p.  13) 


VI.  CD-ROM  INDEXING  STRATEGIES 


A.  BALANCED-TREE  INDEXES 
!.  Tree  Construction 

The  general  form  of  a  tree  structure  on  a  CD-ROM  is  similar  to  that  of  a 
broad,  shallow  balanced-tree.  Since  CD  ROMs  are  not  concerned  with  insertions  and 
deletions  the  blocks  of  the  tree  can  be  packed  completely  full.  This  results  in  the  tree 
using  less  space  and  in  each  block  having  a  larger  number  of  children.  Moreover,  a 
broader,  shallower  tree  is  produced. 

If  baianced-'rees  are  built  by  inserting  records  randomly  and  if  procedures 
developed  for  handling  the  growth  of  dynamic  trees  are  used,  cite  biocks  of  the  tree  will 
be  between  5U  and  l  Oil  percent  full  with  an  average  utilization  of  between  o7  and  S3 
percent  (Zoellick.  19S6,  p.  1S4).  That  is.  trees  will  contain  blocks  that  are  not 
completely  full.  A  special  tree-loading  procedure  that  does  not  use  the  normal  block- 
spiitting  method  involved  in  balanced-tree  insertion  is  needed. 

The  first  step  in  developing  an  appropriate  tree-loading  procedure  is  to  sort  all 
the  records  by  their  keys  as  discussed  in  Chapter  Five.  The  sorted  records  are  then 
written  one  at  a  time  into  the  leftmost  block  at  the  lowest  level  of  the  tree.  When  that 
block  is  full  it  is  written  out  to  disc.  The  next  record  goes  into  a  parent  block.  Then  the 
next  block  at  leaf  level  is  filled.  When  this  second  leaf  block  is  full,  it  is  written  out  to 
disc  and  another  single  record  is  placed  in  the  parent  block.  This  process  continues 
until  all  the  records  have  been  loaded.  Figure  6. 1  shows  that  all  the  records  are 
arranged  in  the  blocks  in  a  numbered  sequence. 

The  primary  advantage  of  this  loading  procedure  is  that  it  capitalizes  on  the 
read-only  nature  of  the  CD-ROM  by  building  a  shallow  tree  and  avoiding  seeks.  There 
is  also  an  important  second  advantage.  If  each  block  is  written  out  as  soon  as  it  is  full, 
then  parent  blocks  will  be  stored  in  close  proximity  to  their  children,  making  use  of  the 
CD-ROM's  better  performance  on  short,  local  seeks.  Furthermore,  the  proximity  of 
parents  and  ehildrcn  will  never  be  threatened  since  the  balanced-trees  used  for  CD- 
ROM  are  not  dynamic. 

There  are  other  possibilities  for  decreasing  seeks  if  something  is  known  about 
the  distribution  of  requests  for  the  records  stored  in  the  tree.  Say.  for  example,  that  it 
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Figure  G.  1  Properly  Loaded  Balanced-Trees. 

is  known  that  85  percent  of  the  requests  arc  for  10  percent  of  the  records.  The  number 
of  seeks  can  be  greatly  reduced  if  the  tree-loading  procedure  can  be  designed  to  place 
the  most  frequently  used  records  as  near  the  root  as  possible. 

2.  Tree  Optimization  Formulas 

The  following  formulas  were  used  by  Reference  Technology  in  designing  the 
l  LOCI)  database: 

0  L  >  =  log(X  +  1)  /  log(P  +1)  L  is  the  #  of  tree  levels 

«  P  >  =  J/N  +1-1  P  is  #  of  key  records  in  an  index  page 


X  <  =  <P  +  1 )  -  1 


X  is  #  of  kev  records  in  the  index 


These  formulas  relate  number  of  key  records,  number  of  tree  levels,  and  page  size  and 
are  used  to  optimize  balanced-tree  performance  for  CD-ROM  databases.  Table  4 
disrlavs  examples  of  how  the  formulas  can  be  used. 
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1  ABLE  -4 

OPTIMIZING  BALANCED-TREE  PERFORMANCE 

Given  me  number  ot  record',  pace  size,  and  key  record  si/e.  the  minimum  number  of 
tree  :e\ei'  can  ne  calculated: 

Number  of  Key  records  =  lOU.OIXJ.OOO 
Face  size  -  4(Wb  bytes 
key  record  size  =  8  bytes 

N  -  i  =  100.000.001 

P  -  I  =  (4000  /  8)  -r  I  =  513 

L  A  log  IN  -r-  I)  >  log  (P  -i-  0  =  2.95 

Since  mere  must  be  an  integral  number  of  levels.  3  levels  are  required 


(  o'.en  me  number  ol  tree  levels,  numberot  records,  and  record  size,  the  minimum  page 
size  can  ne  oak  mated 

Number  ol  tree  levels  =  2 
Number  of  key  records  =  2.0()U.(X)() 
kev  record  size  =  32  bytes 

N  —  I  =  2.DOO.OOI 

I  /  L  =  I  /  2  =  5 

I’  >  (( N  -  -  I  =  1413.21 

Since  there  must  be  an  integral  number  of  records  on  a  page,  the  page  size  must 
be  large  enough  lor  1414  records.  It  the  page  size  ts  divisible  by  2048  bytes  (the 
CD-ROM  sector  size)  a  47.104-bvte  page  size  is  needed. 


(  n\en  i he  number  ol  levels,  page  size,  and  key  record  size,  the  maximum  number  ol 
reu ’k!'  can  he  determined 

Number  o|  tree  levels  =  2 
Pace  si/e  =  4IIOO  hues 
kev  lecoiii  -aze  ~  8  hues 


P  -  I  =  14000  K)  I  =513 
N  <  up  -  I  Ai  -  I  -  263.108 

At  most  203.168  records  can  be  placed  in  this  tree. 
Source:  Key  Record  Manager,  pp.  21-22. 


15.  HASHED  INDEXES 
!.  Overflow  Avoidance 

I  i  ashing  tits  the  strengths  and  we  iknesscs  of  the  CD-ROM  perfectly  1'or 
applications  that  do  not  need  to  access  records  in  order  by  key.  It  consists  of  using  a 
function  to  .ransferm  each  record's  key  into  a  bucket  address  within  the  file.  In  order 
to  End  a  particular  record,  the  function  is  applied  to  that  record's  key.  and  then 
retrieves  the  bucket  at  the  resulting  address.  Hashing  works  well  and  permits  single¬ 
seek  retrievals  as  long  as  long  as  there  is  room  for  each  record  in  its  associated  bucket. 
The  following  variables  can  be  manipulated  to  guarantee  that  overflow  does  not 
happen: 

•  packing  density  of  the  hashed  storage 
»  the  size  of  the  bucket 

*  the  design  of  the  hash  function 

Packing  density  and  bucket  size  are  discussed  further  in  the  next  chapter. 

2.  Hashing  Functions 

Since  CD-ROM  is  a  read  only  medium,  there  exists  a  complete  list  of  the  keys 
to  be  hashed  before  the  file  is  built.  The  keys  can  be  analyzed  to  discover  functions 
that  would  distribute  them  more  uniformly  than  a  random  function  would.  A  perfectly 
uniform  distribution  would  place  an  equal  number  of  records  in  each  bucket  and 
guarantee  no  overllow  even  at  a  packing  density  of  100  percent.  Although  developing 
such  a  function  can  be  very  time-consuming,  an  economical  way  of  improving  on 
purely  random  distributions  can  often  be  found. 

The  CD-ROM's  read-only  nature  makes  it  possible  to  optimize  a  hash 
function.  It  is  also  practical  because  large  computers  operating  in  a  batch  mode  can  be 
used  to  ciCate  the  data  set  that  will  be  used  interactively  by  small  computers. 

C.  INVERTED  INDEXES 

Inverted  hies  arc  ideally  suited  for  lull-text  fields  because  when  used  with 
structured  holds  containing  repeating  key  values  they  save  index  space.  A  copy  of  each 
key  '..due  :s  stored  in  an  index  along  with  a  pointer  to  a  list  of  all  records  associated 
*1.0  key.  Tire  Comments  held  in  applicable  databases  is  normally  a  full-text  held 
a  :  v  d  candidate  for  an  inverted  index.  If  each  word  is  used  as  a  hey  in  a  key 
r : _ .  :  hie  vuuc  words  will  occur  over  and  ever  again  and  create  a  very  large  index. 
A:;  inverted  hie  stores  each  word  only  once  to  represent  all  of  its  occurrences  and 
remits  m  a  much  smaller  index,  figure  0.2  represents  an  inverted  index  lor  words 
bezmnmg  with  A  and  IT  from  a  fictitious  database. 
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CD  ROM  Optical  rubushina.  p.  11 


Fieure  6.2  Inverted  Indexing. 


Sueli  sc.phisticated  indexing  schemes  can  sometimes  require  as  much  or  more 
space  as  the  data  itself.  The  Gmlicr  Electronic  Encyclopedia  requires  6<>  megabytes  to 
accommodate  the  text  and  50  megabytes  to  accommodate  the  indexing.  (Dixon.  10S7, 
pp.  b>-r> 

!).  Cl’OOSINO  THE  PROPER  INDEX  STRl CTl  RE 

because  CD-ROM  discs  are  a  read-only  medium,  the  choice  of  index  structure 
be  made  v/hon  tiie  database  is  designed.  It  is  possible  to  use  moie  than  one  type 
mdex  cm  a  sineie  database  so  that  it  becomes  feasible  to  chooce  whichever  type 
'hers  '::e  best  neiformar.ee  for  individual  kev  fields. 


>:yc-:v 


Bu.uaccd  tree  sememes  arc  best  lor  applications  where  partial-match  searches  arc 
frequent.  Tins  is  because  the  index  can  be  ordered  by  the  key  value.  They  also  perform 
’•veil  m  exact-match  applications  when  it  is  desirable  to  minimize  the  index  size. 
Balanced-trees  used  in  CD-ROM  applications  waste  no  space  and  can  typically  acquire 
any  key  in  the  index  with  only  two  or  three  accesses. 


Mash  tables  perform  best  when  the  quick  access  of  exact  matches  is  the  main 
consideration.  Normally,  hash  tables  can  be  constructed  so  that  only  one  disc  access  is 
sufficient.  However,  hash-table  indexes  are  not  as  compact  as  balanced-trees  and  will 
typically  be  20  to  50  percent  larger  than  a  comparable  balanced-tree  index. 
Furthermore,  hash  tables  perform  partial-match  searches  poorly  because  it  is  nearly  the 
same  as  searching  a  sequential  file.  (Colvin.  19S7.  p.  115) 

Boolean  and  relational  operations  on  CD-ROM  discs  are  best  supported  by 
inverted  Ties.  Either  hash  tables  or  balanced  trees  can  be  used  to  create  the  files.  Since 
ail  data  record  numbers  containing  a  particular  key  value  are  listed  together  in  an 
inverted  file,  it  must  be  loaded  into  a  rather  large  memory  butler  to  minimize  accesses 
to  the  CD-ROM. 

The  index  structure  used  in  the  development  of  TLOCD  was  a  combination  of  a 
balanced-tree  and  a  hash  table.  In  this  way  the  time  required  to  perform  both  partial 
and  exact-match  searches  could  be  minimized. 


do 


VII.  CD-ROM  FILE  MANAGEMENT 


A.  GENERAL  DIRECTORY  STRICTURE 

The  High  Sierra  Standard  entails  a  hierarchical  structure  of'  descending 


subdirectories  branching  down  from  the  parent  directory.  This  directory  structure  is 
called  a  "Standard  Fiie  Structure."  and  there  must  be  only  one  per  CD-ROM  disc.  A 
path  table  operates  as  an  index  to  each  subdirectory  and  provides  a  pointer  to  the 
logical  block  number  where  the  subdirectory  is  located.  A  path  table  obviates  the  need 
to  sort  each  level  of  the  directory  hierarchy  in  the  search  through  the  directory 


certain  circumstances. 


ith  tabic  can  be  contained  in  RAM. 


me-'CCK. 


access  to  the  subdirectory  of  interest.  This  occurs  when  the 


subdirectory  names  are  short  enough  and  the  number  of  subdirectories  small  enough  so 


that  the  path  table  can  reside  in  one  physical  logical  sector.  (Approximately  128 


subdirectory  names  of  eight  characters  each  will  cause  the  path  table  size  to  be  about 
204$  bytes  or  one  logical  sector.)  Thus,  given  an  eight-level  tree,  holding  a  path  table 
in  RAM  saves  seven  seeks.  (Standard,  1986.  p.  2.4) 


B.  DIRECTORY  STRUCTURE  DESIGN 
1.  Multiple- File  Explicit  Hierarchies 

This  type  of  directory  structure  is  used  by  UNIX,  MS-DOS.  VMS,  and  other 
magnetic  disk  systems.  Early  versions  of  Digital  Equipment's  UNIFILE  system  are  an 
example  of  a  CD-ROM  file  system  that  used  this  kind  of  directory  structure.  This 
particular  structure  as  shown  in  figure  7.1  allows  subdirectories  to  be  treated  as  files.  It 
is  an  excellent  system  for  magnetic  disks  because  it  provides  the  flexibility  required  in 
order  to  add  new  subdirectories  and  delete  old  ones.  However  CD-ROMs  do  not 
require  such  flexibility.  Furthermore,  we  cannot  afford  the  time  to  seek  from 
subdirectory  file  to  subdirectory  file  in  order  to  find  a  file  with  a  long  path  name  such 


johnson  programs  source  acctg  ledger  post.c 

The  strong  features  of  this  type  of  directory  structure  are  familiarity  and  the 
...ct  that  it  handles  generic  searches  reasonably  well.  Moreover,  by  taking  advantage  of 
•he  CD-ROM's  read-only  nature,  the  files  in  each  subdirectory  can  be  sorted  and 
:;nr*p've  eeneric  searching  even  more. 
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i  ne  main  disadvantage  :s  mat  we  must  search  through  an  enure  level  o!  t.'.e 
d.  re  Tory  structure  while  looking  for  a  Hie.  If  all  the  files  arc  in  the  root  then  a  scare  it 
for  a  single  file  would  involve  the  whoie  directory.  Even  if  the  files  are  sorted  within 
each  directory  level,  a  binary  search  of  a  large  single-level  directory  containing  Iu.imjd 
Hies  would  require  a  dozen  or  more  seeks  back  and  forth  across  the  sectors  that  make 
un  the  directory. 

2.  Single-File  Explicit  Hierarchies 

This  approach  to  directors-  hierarchies  involves  placing  the  entire  directory 
structure  m  a  single  file.  The  root  directory  and  all  subdirectories  are  treated  as 
records  within  a  file  rather  than  separate  files.  Figure  7.2  represents  this  type  of 
structure,  which  was  used  in  the  first  version  of  LascrDos,  a  tree-oriented  system 
designed  by  TMS.  I nc.  for  optical  discs.  The  left  pointers  front  the  suhditectory  record-, 
point  to  elements  in  the  subdirectory.  Right  pointers  always  point  to  file*- 
subdirectories  at  the  same  level. 
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Source:  CD-ROM  The  New  Papyrus,  p.  116. 


Sinc  e-  File  Explicit  Hierarch 


1  i'.o  important  benefit  realized  from  compressing  the  directory  hierarchy  into  a 
sinele  die.  rather  than  spreading  it  out  by  using  a  different  hie  for  each  subdirectory,  is 
that  we  can  often  cut  down  on  the  number  of  seeks  required  to  open  a  file.  A 
somewhat  small  directory  containing  no  more  than  two  hundred  files  can  be  contained 
in  just  two  or  three  sectors  which  could  easily  fit  in  RAM.  This  holds  true  even  if  there 
are  many  levels  of  subdirectories.  Therefore,  the  single-lile  explicit  hierarchy  can  often 
improve  on  the  performance  of  multiple-file  explicit  hierarchies  when  opening  files  that 
have  path  names  containing  several  subdirectory  levels. 

3.  Hashed  Directories 

Any  file  can  be  opened  in  one  seek  if  we  hash  the  entire  path  and  file  name  to 
an  address  within  the  directory.  This  will  work  even  if  there  are  tens  of  thousands  of 
files  on  'he  disc.  A  hash  function  would  transform  the  character  string  representing  th: 
patn  .md  hie  name  into  the  address  of  a  hash  bucket.  A  seek  to  the  directory  bucket 
would  gain  access  to  the  information  needed  to  open  a  file. 

If  the  hash  buckets  can  be  prevented  from  overflowing,  then  it  can  be 
guaranteed  that  the  hashing  procedure  would  require  no  more  than  a  single  seek.  If 
overflow  occurred,  one  or  more  seeks  would  be  required  in  order  to  locate  the 
information  that  had  to  be  stored  elsewhere.  The  read-only  nature  of  the  CD-ROM 
makes  it  possible  to  manipulate  the  packing  density  of  the  directory  file.  Overflow  can 
be  avoided  by  placing  a  small  number  of  records  into  a  large  file.  The  more  tightly  a 
file  is  packed,  the  more  likely  it  is  that  at  least  one  bucket  will  overflow.  The  bucket 
size  also  affects  overflow.  No  overflow  could  be  guaranteed  if  the  entire  file  was 
considered  to  be  a  single  bucket.  Unfortunately,  the  entire  file  would  have  to  be  read 
into  and  processed  in  RAM. 

4.  Indexed  Directories 

The  key  to  the  success  of  this  approach  is  a  structure  called  a  path  table.  The 
path  table  provides  a  compact  mechanism  for  quick  translation  of  the  full  path  for  a 
subdirectory  into  an  integer  called  the  path  identifier.  The  path  identifier  is  actually  the 
relative  position  of  each  file  obtained  from  a  level  order  traversal  of  the  directory 
hierarchy.  By  examining  f  igure  7.3  the  path  identifiers  lor  the  following  path  names 
can  be  determined: 

•  s'riib  =  1 


:xt  =  a 


strlib  ohj  = 


9  y.Yd Luio  source  —  - 
*  reports  =  10 

0  text  specs  input  =  14 

I’.'.e  path  table's  ability  to  compress  an  entire  directory  path  into  a  two-byte  integer 
guarantees  that  director.-  records  can  be  kept  relatively  short  and  that  many  directory 
records  can  be  nut  into  each  block  of  the  director-  structure. 
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Sourcc:  CD-ROM  The  New  Papyrus,  p.  119. 


Figure  7.3  Index  Path  File  Structure. 

After  performing  an  average  seek  of  about  .5  seconds,  a  minimum  of  one  two- 
byte  sector  is  read  in  from  the  disc.  For  an  additional  cost  of  six  milliseconds  another 
OR  bytes  can  be  rend  in.  making  a  total  of  SK  bytes  in  all.  If  the  si/.c  of  the  directory 
records  can  be  held  to  32  bytes  each,  then  each  seek  out  to  the  CD  ROM  can  bring  in 
as  many  as  2.50  records  for  an  SR  block. 

1  he  file  records  are  placed  into  the  blocks  of  a  file  table  which  contain  the 
m.  jrmaticn  needed  to  open  any  file  in  the  file  system.  They  arc  arranged  according  to 
‘heir  pat:-,  identifier  which  was  extracted  from  the  path  table.  As  a  result,  all  the  files  in 
a  s.i.Re  subdirectory  are  grouped  together  ( i.e. ,  they  have  the  same  path  identifier.)  and 
■  ordered  by  name.  This  structure  supports  eilicient  generic  and  binary  searching. 
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When  a  particular  file  'is  to  be  opened,  we  need  to  Had  the  block  in  the  hie 
table  that  has  the  record  corresponding  to  the  desired  path  identifier  and  Hie  name.  The 
costly  part  of  the  file  search  is  the  seek  to  tire  block's  beginning,  so  it  is  desired  to  find 
the  right  block  on  the  first  attempt.  To  ensure  this  occurs,  an  index  table  is  used  to  tell 
the  path,  and  file  names  that  are  at  the  block  boundaries.  Figure  7.4  displays  an 
overview  of  the  contents  in  the  file  table.  Now  suppose  the  tile  to  be  opened  is: 

,  strlib  /source  /strchop.c 

It  is  shown  in  Figure  7.4  that  the  request  starts  at  the  path  table  and  converts  the  path 
name  into  a  path  identifier  of  '4'.  The  index  table  is  then  searched  for  "dstrchop.c". 
Since  the  value  of  "4strchop.c"  is  less  than  the  first  entry  (alphabetically),  it  follows  the 
first  pointer  from  the  index  table  to  find  the  first  block  in  the  file  table  where  it  finds 

'he  location  of  die  File  and  other  information  rent  ir-a  to  open  it. 
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ITU 

CONTI- NTS 


I  II  I 

UNI  I  NTS 


I  II. I 

<  (  'NT  INIs 


I'ieure  7.4  L'simi  an  Indexed  Path  Director.'. 


The  path,  tabic's  compression  ability  allows  for  short  directory  records  so  tln.it 
of  them  can  be  packed  into  each  block  of  the  file  table.  This  reduces  the  to’  ;i 
ser  of  Si  ,;cks  required  for  the  file  table.  ,-\  small  file  table  will  result  in  a  small 
table,  it  would  be  very  desirable  to  store  both  the  index  and  path  tables  in  RAM 


*111 


rather  than  forcing  a  disc  seek,  every  time  we  needed  diem.  In  tiiis  way  the  indexed 
director.'  allows  the  opening  of  any  [lie  with  only  one  seek  to  the  CD-ROM. 

C.  BLOCKS  AND  BUFFERS 
I.  Determining  Block  Size 

A  general  rule  applying  to  any  tile  structure  design  is  to  make  each  disk  seek 
as  profitable  as  possible.  This  is  the  reason  why  paged  structures  such  as  balanced-trees 
are  commonly  utilized.  Each  access  to  the  disc  retrieves  enough  data  to  make  decisions 
about  the  next  tree  level  instead  of  making  a  simpie  two-way  choice  in  a  binary  tree. 
The  disc  is  never  accessed  to  retrieve  only  one  record  but  to  retrieve  a  block  of  records 
that  can  be  read  and  processed  much  faster  in  RAM.  Even  though  CD-ROM  seeks 
slowly,  it  can  acquire  a  large  block  of  data  at  an  acceptable  rate.  Therefore,  the  choice 
if  the  biock  size  is  extremely  important. 

Both  physical  and  logical  design  factors  should  be  considered  when  selecting  a 
block  size.  Consider  the  effect  of  page  size  on  the  depth  of  the  trees  previously  shown 
in  Figure  6.1  A  page  that  holds  N  records  can  have  N-t-1  children.  The  first  tree  in 
Figure  6.1  has  a  height  of  two  levels  and  holds  eight  records.  This  height  is  idea] 
because  storing  the  tree's  root  page  in  RAM  ensures  a  one-seek  retrieval  of  any  record 
in  the  tree.  Records  can  be  added  to  the  tree  by  adding  more  levels.  However,  this  will 
increase  the  average  number  of  seeks  required  For  searching.  A  better  plan  calls  lor 
increasing  the  block  size  to  accommodate  more  records.  The  second  tree  in  Figure  6.1 
shows  the  result  of  doubling  the  block  size. 

Since  the  CD-ROM  is  read-only,  it  is  known  exactly  how  many  records  are 
going  to  be  put  into  the  tree  before  it  is  built.  For  example,  storing  50,000  32-byte 
records  and  using  a  block  size  of  2K  will  result  in  a  three-level  tree.  A  two-level  tree 
can  be  built  if  a  block  sizeof  SK  is  used.  It  takes  longer  to  read  a  larger  block,  but 
since  CD-ROMs  can  read  data  at  150K  bytes  per  second,  reading  an  additional  (>K 
bytes  takes  only  2o  milliseconds.  This  is  a  small  price  to  pay  in  return  for  avoiding  an 
additional  5<>0  millisecond  seek.  Minimizing  the  number  of  seeks  is  the  logical 
consideration  for  using  large  block  sizes.  However,  the  CD-ROM's  physical  features 
should  also  be  considered  in  determining  what  block  size  to  use. 

Since  the  sector  size  for  a  CD-ROM  is  2K  bytes,  the  smallest  block  si/e  that 
should  ever  be  coti'ideied  is  also  2K  bytes.  This  is  due  to  the  fact  that  even  if  only  one 
byte  is  needed.  2K  bytes  will  be  retrieved.  An  effective  operating  system  will  transfer 
the  data  directly  into  an  application  program's  work  area  with  no  intermediate  data 


movement.  So  what  happens  if  a  program  requests  only  64  bytes,  or  some  other  sector 
fragment.’  In  this  case  the  operating  system  cannot  assume  that  the  application 
program  has  allotted  enough  space  to  hold  an  entire  2U4S-byte  sector.  A  system  bull'er 
must  be  used  to  hold  the  complete  sector  until  the  64  bytes  desired  can  be  transferred 
to  tiie  application's  work  area.  Data  must  be  handled  or  moved  twice  when  anything 
less  than  a  complete  sector  is  requested.  Therefore,  in  order  to  avoid  unnecessary  data 
movement,  a  block  size  that  is  a  multiple  of  the  2  K -byte  sector  size  should  be  used. 

2.  Buffer  Usage 

Reading  data  in  multiples  of  the  sector  size  results  in  by-passing  the  system 
buffers.  This  blocks  the  operating  system  from  keeping  recently  used  data  in  RAM. 
For  example,  when  a  256-byte  record  is  read,  the  operating  system  uses  one  of  its 
q.  stem  buffers  to  hold  the  sector  containing  the  record.  Now  another  256-byte  record 
is  read  in  from  a  different  sector.  This  new  sector  is  placed  in  a  different  buffer.  The 
program  now  cails  lor  a  third  record  which  happens  to  be  in  the  sector  which  was 
placed  in  the  first  buffer.  Therefore,  no  seek  is  required  for  the  third  record  because  its 
sector  is  already  buffered  in  RAM. 

Now  suppose  instead  of  reading  fragmented  records.  2K  bytes  are  read  to 
avoid  moving  data  twice.  In  this  case,  system  buffers  arc  not  used  because  the  data 
goes  directly  to  the  application  work  area.  Consequently,  a  section  would  be  read  on 
top  of  the  first  one.  In  order  to  benefit  from  buffering  in  CD-ROM  technology,  the 
decision  of  how  many  buffers  to  provide  and  how  to  manage  them  depends  on  the 
nature  of  the  application.  If  the  application  searches  through  tree-structured  indexes  or 
works  in  both  directions  through  a  sequence,  it  can  benefit  from  a  large  number  of 
buffers.  If  the  application  moves  sequentially  through  the  data  in  one  direction  it  will 
not  benefit  from  buffering  at  all. 

Reference  Technology  utilizes  a  general  purpose  buffering  scheme  known  as 
Least  Recently  Used  (LRU)  replacement.  Information  in  the  buffers  is  retained  for  user 
access  until  buffered  data  are  replaced,  according  to  the  least  recently  accessed 
protocol.  Best  performance  occurs  when  the  page  size  is  the  same  as  the  buffer  si/e  and 
when  the  number  of  buffers  selected  is  sulfieient  to  retain  the  most  frequently  accessed 
pages  in  memory. 

Because  applications  differ,  it  is  impossible  to  ensure  that  the  most  frequently 
accessed  pages  will  always  remain  in  the  bulfers.  A  procedure  is  needed  that  will  select 
the  minimum  number  of  buffers  for  maximum  performance.  Such  a  procedure  would 


require  that  there  be  at  least  one  mere  butler  than  the  number  of  levels  in  the  tree. 
Also,  there  should  be  at  least  two  buffers  for  each  hash  table.  The  extra  bulfer  per 
index  will  hold  the  data  record,  while  the  other  holds  the  index  pages.  Thus,  if  two  tree 
indexes  with  two  levels  each  and  two  hash  indexes  were  frequently  accessed,  all  with 
-T'Ao-bvtc  index  pages,  then  2  2“  =  12  bullers  of  4096  bytes  each  would  be  the 

minimum  configuration  for  best  performance.  (Key,  1 986.  p.  23) 

D.  MULTI-VOLUME  DISCS 

1.  Adding  Additional  Discs 

A  CD-ROM  disc  is  described,  according  to  the  High  Sierra  standard,  as  a 
volume  (Standard.  1986,  p.  2.5).  The  standard  allows  for  multi-volume  sets  of  discs, 
which  are  of  two  basic  kinds.  The  first  is  the  type  of  multi-volume  set  designed  to  hold 
a  Angie  massive  database  that  exceeds  the  capacity  of  a  single  disc.  The  path  table  and 
directory  structure  on  each  volume  of  this  kind  is  required  to  be  the  same.  In  this  way, 
the  location  of  any  file  in  tiie  set  can  be  found  by  reading  the  director.'  from  any  one 
of  the  discs.  Clearly,  it  may  become  necessary  to  mount  a  different  CD-ROM  disc  from 
the  set  in  order  to  read  that  file.  However,  the  presence  of  identical  path  tables  and 
directories  avoids  the  need  to  mount  disc  after  disc  to  find  the  file  of  interest. 

The  second  type  of  multi-volume  set  of  CD-ROMs  is  necessitated  by  the  need 
to  update  files  or  add  new  volumes  to  an  existing  volume  set.  If  this  is  the  case,  the 
most  recent  volume's  path  and  directory  information  must  supercede  that  of  all 
previous  volumes.  Moreover,  the  the  last  volume  in  the  set  must  be  mounted  when  the 
system  is  booted  in  order  to  supply  the  system  with  the  freshest  information.  By 
deleting  references  to  a  file,  or  including  references  to  a  file  in  the  directory  structure  of 
the  latest  disc  in  the  updating  volume  set,  existing  files  can  be  "deleted."  "modified."  or 
"replaced."  They  actually  still  exist  on  the  earlier  discs  but  since  the  latest  directory  no 
longer  points  to  them,  they  are  no  longer  available  to  the  system.  Although  physically 
present  for  the  life  of  the  CD-ROM,  they  arc  logically  lost  or  altered  under  the  present 
configuration  when  the  new  volume  is  mounted.  However,  they  can  be  restored  if  an 
eariior  volume  in  the  set  is  mounted  at  system  start-up. 

2.  Extended  Attribute  Records 

CD-ROM  file  management  that  is  supported  within  operating  systems  such  as 
PC-DOS,  secs  optical  disc  data  as  simply  a  stream  of  bytes.  For  other  operating 
environments,  extended  attribute  records  (XARs)  can  provide  additional  information 
about  the  file  and  its  structure.  .An  XAR  is  an  optional  attachment  to  the  beginning  of 


-i  me.  containing  extra  information  a  pout  that  ale.  examples  of  such  additional  data 
include  creation  and  expiration  dates,  access  control,  record  structure,  record 
attributes,  and  application-specific  information. 

One  particular  use  of  XARs  is  to  control  which  version  of  a  file  is  to  be  used 
when  there  is  a  multi-volume  set  of  discs  containing  several  versions  of  a  file.  This 
works  because  the  X.\R  affixed  to  the  last  extent  of  a  given  file  supercedes  the  XARs 
affixed  to  ail  the  other  previous  extents  of  that  file.  If  there  is  no  XAR  with  the  last  file 
extent,  the  XARs  with  preceding  extents  are  ignored.  Thus,  by  altering  the  XAR  for 
the  final  extent  of  a  file,  the  incidental  information  about  a  file  is  effectively  updated 
when  a  new  CD-ROM  is  issued. 

Another  use  of  XARs  is  to  restrict  who  may  read  certain  files  on  a  disc.  The 
standard  is  similar  to  the  VMS  "system,  owner,  group,  world"  permission  IcMn.  It 
snenid  be  noted  that  access  restriction  oniy  works  under  those  operating  systems  that 
recognize  it.  If  someone  carries  a  disc  with  restricted  files  to  a  computer  whose 
operating  system,  like  MS-DOS.  does  not  recognize  access  protection,  the  system  will 
read  the  disc,  regardless  of  the  setting  of  the  XAR.  Consequently,  designing  access 
restriction  into  a  disc  must  be  coupled  with  a  plan  to  restrict  the  physical  distribution 
of  the  discs.  (Standard.  1986,  p.  2.3) 


VIII.  CD-ROM  APPLICATION  SOFTWARE  CONSIDERATIONS 


A.  FILE  SYSTEM  SUPPORT 

1.  Origination  Software 

Before  making  a  CD-ROM.  the  files  that  will  appear  on  the  disc  must  be 
assembled  according  to  the  rules  of  the  logical  format.  Origination  software  does  this 
work,  providing  the  writing  component  of  the  file  system. 

At  the  present  time,  most  origination  software  runs  on  minicomputers  in 
batch  mode.  Figure  8.1  shows  the  relationship  of  the  four  principal  components  of 
IMS's  LaserDOS  origination  system.  The  user  begins  with  a  Specify  program  that 
provides  an  interactive  shell-like  mechanism  for  creating  the  directory  hierarchy  that  is 
to  be  used  on  the  CD-ROM.  During  this  step  the  user  can  indicate  which  files  are  to 
go  in  which  subdirectories.  The  specification  is  used  as  input  to  a  Load  process  that 
reads  user  files  from  tape  and  magnetic  disk  to  create  a  disc  image,  complete  with  a 
volume  table  of  contents  and  directory  structure  in  the  logical  format  that  will  be  used 
on  the  CD-ROM.  After  loading,  the  user  can  run  a  Verify  program  that  automatically 
checks  the  internal  consistency  and  integrity  of  the  disc  image.  The  user  can  also  run  a 
Shell  program  that  exercises  the  image  of  the  CD-ROM  file  system  interactively, 
allowing  the  user  to  dump  out  the  contents  of  individual  files,  copy  files  to  the  host 
operating  system,  and  so  on. 

2.  Destination  Software 

Destination  software  is  the  reading  component  of  the  file  system.  It 
understands  the  logical  format  and  uses  it  to  provide  access  to  the  CD-ROM  files.  One 
way  to  approach  the  design  of  destination  software  is  to  create  a  file  manager  program 
containing  special  function  calls  that  are  exclusively  for  use  with  the  CD-ROM  and 
winch  bear  no  relationship  to  the  system  calls  provided  by  the  host  operating  system 
'Zoellick.  1  dS6.  p.  125).  The  advantage  of  this  approach  is  that  the  file  manager  and 
application  programs  that  use  it  are  not  affected  by  changes  in  the  operating  system, 
thus  allowing  a  higher  degree  of  portability.  The  main  disadvantage  is  that  applications 
cannot  access  the  CD-ROM  through  standard  system  calls  which  in  turn  prevents 
access  via  high-level  language  I  O  facilities.  This  makes  the  CD-ROM  less  user  friendly 
since  familiar  lancuaee  tools  and  system  utilities  are  unavailable. 
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Another  design  approach  involves  software  such  as  IMS's  I.ascrDOS  and 
Reference  I'echnoiogv  s  Standard  file  Manager,  which  arc  implemented  lor  use  with 
MS-DOS  iZoeilick.  I9S6.  n.  12o>.  The  approach's  intent  is  to  cooperate  with  the  host 
svsteni  as  much  as  possible.  For  example.  LaserDOS  traps  all  system  calls  and 
determines  if  the  call  is  CD-ROM  related.  If  it  is  CD-ROM  related  it  will  handle  the 
call  itself  If  it  is  not.  it  simply  passes  the  call  on  to  MS-DOS  for  completion.  The 
calling  software  is  not  smart  enough  to  know  the  difference.  Reference  Technology  s 
Standard  File  Manager  works  similarly  in  the  TLOCD  system.  The  CD-ROM  appears 
as  past  another  disk  drive  to  the  TLOCD  user. 


B.  COMPILER  LIMITATIONS 

Some  compilers  used  in  writing  applications  that  address  the  file  system  can  in 
themselves  limit  the  size  of  tiles.  Tor  example.  MS- PASCAL  (TMr  (versions  LX. 
3.2’>  hunts  the  size  of  hies  to  eight  megabytes.  CS6  (TMr  (version  1.2)  has  the  same 
limn.  Lattice  C  iTM)  (version  2. IX)  on  the  other  hand  is  not  limited  in  this  wav. 
Reference  Technology's  Standard  File  Manager  limits  itself  to  file  sizes  of  two  Gbytes 
but  the  compiler  must  be  capable  of  producing  code  that  can  access  a  file  of  this  size. 
I’C-DOS  has  the  same  two-Gbyte  file  size  limitation  as  the  Standard  File  Manager  if 
files  are  accessed  through  the  Standard  File  Manager  "file  handling"  functions. 
(Standard.  19S6.  p.  2.12) 

Another  potential  limitation  from  compilers  is  that  some  restrict  the  number  of 
liies  that  can  be  open  at  one  time.  For  instance.  Lattice  C  (TM)  (version  2. IX)  has  a 
limit  of  20.  including  the  standard  input,  output,  and  error  files,  as  well  as  any  hard 
disk  or  diskette  files.  The  Standard  File  Manager  for  CD-ROM  systems  allows  up  to 
2'*0  Hies  to  be  open  simultaneously. 

C.  PC- DOS  ADAPTATION 

One  of  the  more  frustrating  things  about  using  CD-ROM  with  IBM  PCs  is  the 
limitation  placed  on  the  size  of  a  logical  disc  volume  by  the  PC-DOS  operating  system. 
It  is  only  32  megabytes— a  mere  thimble  full  compared  to  the  54o  megabytes  typically 
(•..Gable  on  a  simile  CD-ROM.  Fortunatelv.  there  are  sex  oral  wavs  to  sidestep  this 


!::;.i':.t;on.  One  relatively  easy  way  is  to  surrender  to  PC-DOS  and  break  the  disc  into 
;2-meg abyre  pai titions. 

However,  the  most  powerful  method  to  get  around  the  si/e  limitation  involves  a 
new  interrupt  handler.  It  may  also  be  necessary  for  the  file-management  system  as  well 


'-'.o  directory  depending  on  hew  the  particular  system  is  set  up.  By  trapping  the 
operating  'V stetn  interrupt,  the  interrupt  handler  can  intercept  calls  intended  tor  tire 
CD-ROM  while  other  calls  are  simply  passed  through.  Once  intercepted,  the  CD-ROM 
calls  can  be  treated  differently,  still  maintaining  system  transparency  to  the  user. 

The  difficulty  arises  when  the  interrupt  handler  must  also  support  even'  disc  call 
in  exactly  the  same  way  as  PC-DOS  supports  them.  Those  calls  include  functions  that 
open  tiles,  read  from  files,  check,  for  remaining  disc  space,  and  so  forth.  Supporting  ail 
of  those  functions  necessitates  a  tremendous  amount  of  code  veneration. 
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IX.  A.NALVSIS  AND  DISCISSION 


A.  SHIPBOARD  USE  OF  CD-ROM 
!.  Departmental  Applications 

There  are  many  applications  lor  CD-ROM  systems  on  board  L'.S.  Navy 
vessels.  Such  applications  will  decrease  the  ship's  weight  (by  eliminating  paper  storage 
medial  and  make  more  space  available.  The  advantages,  and  disadvantages  and 
possible  problem  solutions  are  addressed. 

Tiie  Navigation  department  should  store  its  hundreds  of  charts  on  CD-ROM 
and  eliminate  a  majesty  of  its  bulky  chart  cabinets.  The  system  would  store  the  charts 
in  ascending  order  according  to  chart  number  and  wouid  also  provide  a  cross-reference 
index  for  user  assistance.  The  system  would  prompt  the  user  to  enter  the  number  of  the 
chart  he  wishes  to  see  and  then  display  that  chart  on  the  monitor.  However,  there 
must  be  a  system  on  board  for  reproducing  these  charts  into  a  paper  medium  so  that 
corrections,  courses,  fixes,  and  coordinates  can  still  be  plotted.  The  technology  needed 
to  reproduce  NOAA  charts  in  various  scales  is  now  available  from  LaserPlot. 
Inc. i  Belanger.  1 9S7.  p.  13). 

The  Operations  department  should  use  CD-ROM  to  hold  its  classified 
publications.  Security  will  be  better  because  there  will  be  fewer  classified  materials  to 
be  monitored.  Confidential  material  would  be  kept  on  one  CD-ROM.  Secret  material 
on  another,  and  Top  Secret  material  on  still  another.  However,  in  environments  such 
as  MS-DOS.  security  becomes  breeched  when  a  person  with  the  "need  to  know"  about 
a  certain  topic  has  access  to  all  other  classified  information  that  resides  on  the  disc  he 
happens  to  be  reading.  In  that  case,  software  would  have  to  be  developed  in  which  the 
slip  s  (.'MS  custodian  would  control  a  "read  denial"  lock  for  each  classified  file.  The 
operating  system  would  not  relinquish  control  to  the  CD-ROM  file  manager  without 
_;ie. king  the  lock  status.  The  lock  could  only  be  set  or  reset  according  to  a  program 
ev.vme.i  by  the  CMS  custodian.  No  file  could  be  opened  and  read  without  the 
custodian's  knowledge  and  approval.  An  individual  would  sign  for  the  CD-ROM  and 
<  MS  uk  : d ; a : i  would  release  the  looks  oil  those  files  that  the  user  is  qualified  to 
■  I  pen  the  return  of  the  classified  disc  the  lock  would  be  reset.  Another 
particularly  heipful  <  D-ROM  application  in  the  Operations  area  involves  "signal 


or  tnc'icai  eommumcauuns.  Such  an  appi;cat:on  should  be  written  to  scare  n 
through  r.ict;eai  publications  such  as  NWPs  and  AWI’s  and  break  coded  signals, 
t  hereby  ensuring  timeliness  and  accuracy  in  situations  that  can  be  and  often  are 
critical  The  tactical  oilicer  would  key  in  the  coded  signal  phrase  and  the  system  would 
search  its  database  tor  that  particular  sequence  of  words.  The  results  would  be 
displayed  on  monitors  located  on  the  shin  s  bridge  and  in  CIC. 

The  Engineering  department  maintains  a  vast  number  of  operating  manuals, 
technical  manuals,  repair  manuals,  and  schematics.  The  transfer  of  these  from  paper  to 
CD-ROM  would  certainly  reduce  weight  and  increase  available  departmental  space. 
1  he  engineers  would  also  have  access  to  many  more  manuals,  blueprints,  and  technical 
publications  not  normally  carried  on  board.  But  how  is  a  repairman  going  to  get  a 
repair  manual  to  die  -wen"  A'  repair?  Must  lie  go  to  a  CD-ROM  reader  and  print  out 
die  urpucuoie  pages.'  1  iie  answer  is  a  qualified  yes.  A  repairman  will  usually  have  to  go 
to  a  centralized  location  to  check  out  a  manual.  Disc  readers  and  printers  should  be 
placed  in  these  strategic  locations  in  order  to  minimize  the  inconvenience.  In  certain 
circumstances,  with  the  use  of  some  advanced  technology,  a  print-out  may  not  be 
necessary. 

The  Supply  department  should  use  CD-ROM  to  store  its  wide  variety  of 
catalogs,  parts  lists,  and  various  other  publications.  Cookbooks  and  recipes  would  no 
longer  be  lost  or  misplaced.  All  of  these  potential  uses  would  be  complemented  by  the 
CD-ROM's  ability  to  store  visual  images.  The  supply  clerks  can  see  exactly  what  they 
are  ordering  and  thereby  reduce  errors  that  often  result  from  making  assumptions  or 
guessing  about  item  uncertainties.  Moreover,  CD-ROMs  already  contain  the  Naw 
Management  Data  List  (NMDL)  and  Parts  I  and  II  of  the  Master  Repair  Items 
Listing  (MRIL)  which  is  distributed  oy  the  Navy  Publications  and  Printing  Service. 
WAS  I  p  also  sponsored  the  TLOCD  project  done  here  at  the  Naval  Postgraduate 
School. 

The  Administration  department  would  no  longer  have  to  print  and  distribute 
c  pies  of  Navy-wide  regulations  and  instructions  throughout  the  ship.  The  drawback 
here  ;s  lack  of  shipboard  portability.  Tor  example,  the  person  desiring  the 
.rmatien  must  be  in  the  immediate  \  icinitv  of  a  CD-ROM  disc  reader,  lie  cannot  go 
'  ,  stateroom,  relax,  and  thumb  through  the  newest  instruction  or  regulation— 
unless.  Mi  c-uirse.  there  happens  to  be  a  CD-ROM  disc  reader  in  his  stateroom.  Ibis 
scenario  is  not  unrealistic.  Considerin'!  that  the  total  cost  of  a  disc  reader,  monitor. 
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mu’.d  be  placed  in  nearly  all  lire  ■'p:u<.<  on  board  the  ship.  Costs  could  be  icdmed 
farther  :t‘  a  networking  system  were  implemented  and  public  terminals  maJe  available 
to  the  crew.  One  possible  networking  scheme  would  involve  a  modem  to  modem 
machine  interlace  using  the  ship’s  telephone  lines.  However,  this  method  might 
interrupt  routine  shipboard  communications  by  tying  up  die  phone  lines.  A  better 
solution  would  involve  the  development  of  a  local  area  network  (LAN)  which  would 
allow  as  many  users  as  there  were  system  hook-ups.  Each  compartment  would  be 
wired  so  that  portable  terminals  could  be  supported.  The  structure  would  be  relatively 
simple  Lor  such  a  system  and  could  be  supported  by  a  common  network  topology  such 
as  a  ring.  The  decision  to  implement  a  LAN  or  to  pursue  a  certain  network  topology 
across  a  particular  class  of  shins  should  be  made  by  NAYSEA  based  upon  licet 
managerial  ’•cquirements  determined  by  individual  ship  needs. 

2.  CD-ROM  Impact  un  die  Paperless  Ship 

Every  olTicer  and  petty  ollicer  aboard  every  Navy  ship  has  at  one  time  or 
another  become  frustrated  by  the  unending  How  of  required  paperwork  and  the 
plethora  of  information  in  technical  manuals  and  documents  that  must  be  available, 
read,  and  studied.  Cumulatively,  their  weight  is  in  tons.  VADM  J.  Metcalf  III  states. 

I  find  it  mind-boggling.  \Ve  do  not  shoot  paper  at  the  enemy.  We  do  not  train 
sailors  to  be  registrars  and  correctors  of  publications.  1  want  those  guys  worried 
about  lighting,  not  worrying  about  keeping  up  the  publications." 

Tire  admiral  has  launched  an  initiative  to  create  a  "paperless"  ship  by  1990  as  a  first 
step  toward  driving  paper  from  the  entire  fleet.  The  first  ship  would  be  a  frigate,  ho 
said,  that  will  probably  be  equipped  with  different  types  of  electronic  information 
systems.  (Metcalf.  Pis'7,  p.  35) 

CD-ROM  technology  is  only  a  piece  of  tire  puzzle  when  it  comes  to  putting 
together  such  a  system.  One  must  consider  the  feasibility  of'  making  CD-ROM  disc 
readers  accessible  to  all  departmental  and  divisional  ollices  as  well  as  in  CIC.  DCC.  the 
Bridge,  engineering  spaces,  and  staterooms.  The  initial  cost  would  be  considerable  but 
•.  aid  be  offset  in  a  short  while  by  the  reduction  in  mailing  costs  of  optical  discs  as 
opposed  to  paper.  See  figure  9.1  for  a  comparison  between  mailing  costs  of  CD-ROM 
i  -Titer  storage  media. 

Keyboards,  monitors,  printers,  and  disc  readers  must  be  kept  in  a  relatively 
cool  environment  in  order  to  reduce  downtime  and  maintain  operational  readiness. 
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EQUIVALENT 
MAILING  COSTS 
(in  dollars ) 
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Figure  0. 1  CD-ROM  Mailing  Compamun. 
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Matty  arc  no:  currently  capable  of  producing  such  an  environment  v.'ith  any 

a':X';c:\y-- cspccl  tiiy  in  humid  climates  such  as  the  Persian  Gulf.  Indian  Ocean,  or 
Caribbean  Sea.  The  newer  ship  classes,  however,  should  not  experience  as  many 
problems  because  of  additional  electronics  needs  being  addressed  in  the  ships'  original 
design.  Furthermore,  the  loss  of  ship's  power  could  prevent  timely  access  to  important 
data.  In  that  case,  it  would  be  necessary  that  a  paper  copy  of  such  data  be  stored  on 
board.  An  alternative  solution  would  be  to  require  each  major  user  to  have  his  own 
back-up  power  source  such  as  an  UPS  (uninterruptible  power  supply)  which  runs  oil 
its  own  battery  pack  until  a  diesel  or  gas  engine  is  started  and  begins  to  produce  the 
power  source.  It  is  possible  to  have  an  UPS  for  the  entire  shipboard  computer  system 
but  it  would  require  larger  battery  packs.  The  decision  on  how  to  employ  UPS  is  again 
strictly  a  managerial  one  based  on  individual  ship  characteristics  and  goals. 

Another  problem  that  surfaces  involves  applications  such  as  personnel  or 
disbursing  transactions  that  require  constant  change  or  update.  Write  Once  Read 
Many  (WORM)  opticai  technology  may  be  the  solution  in  these  cases.  Other  emerging 
technology  that  may  be  available  in  the  near  future  includes  erasable  optical  discs 
which  function  in  much  the  same  way  as  a  standard  floppy  disk.  The  goal  of  a 
paperless  ship  is  certainly  obtainable  if  CD-ROM  is  used  in  conjunction  with  other 
elctronic  media  such  as  WORM.  However,  in  order  for  this  to  happen,  ships  must 
maintain  a  cool  operating  environment,  shipboard  portability  issues  must  be  resolved, 
and  the  me  of  additional  electronic  data  storage  methods  to  compensate  for  the  CD- 
ROM's  weaknesses  must  be  available  and  cost  effective. 

B.  CD-ROM  FOR  SHORE  FACILITIES 
1.  Database  Design 

The  use  of  CD-ROM  at  U'.S.  Navy  shore  facilities  must  be  tailored  to  fit  the 
needs  of  the  particular  command.  The  storage  and  retrieval  of  massive  amounts  of 
historical  data  is  the  primary  consideration  for  implementing  a  CD-ROM  system  such 
as  the  TLOCD  system  at  NSC  Oakland.  Database  design  demands  considerable 
attention  from  facilities  wishing  to  ellect ively  capitalize  on  the  read-only  nature  of  CD- 
ROM  technology.  Of  particular  concern  is  the  format  of  the  database.  CD-ROM 
d  itv  ;-es  may  consist  of  a  number  of  liies— each  file  consisting  of  similar  records 
tic  rite  same  logical  format.  Since  a  database  from  a  CD-ROM  perspective  is  a 
collection  of  similar  files  concatenated  together,  a  single  optical  disc  may  contain  many 
dl  or.a  databases  of  different  file  types.  In  this  case,  the  TLOCD  system  actually 
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:::v  ■  i% os  three  distinct  dutab.-iscs—wne  each  lor  the  transaction  lues,  closing  balance 
::!es.  and  audit  trail  hues. 

When  designing  a  database,  attempts  should  be  made  to  maximize  the 
system's  storage  allocation  potential.  This  consideration  was  neclected  in  the  TLOCD 


design.  Consequently,  many  of  the  records  in  each  of  its  three  databases  contain  data 
s-ommon  to  records  in  the  other  two  databases.  Tor  example,  the  National  Item 
Identification  Number  (NUN)  and  date  fields  are  found  in  all  three  record  types  of  the 
TLOCD  system.  This  data  redundancy  across  databases  should  be  avoided  whenever 


possible  in  order  to  achieve  a  higher  level  of  storage  elllciency. 

Care  should  be  taken  not  to  merge  separate  entities  such  as  the  TLOCD 
databases  in  an  attempt  to  delete  redundant  information.  Such  an  attempt  could  load 


:cntmuea  data 


...mod  loss  of  valuable  data.  Note 


three  fictitious  fie  tables  of  the  l  LOCD  system  are  merited  into 


singie  tabic  made  up  of  tuples  that  represent  data  records.  Notice  that  there  are  no 
entries  in  some  of  the  record  fields.  The  space  must  still  be  maintained  and  is  virtually 
wasted.  Now  notice  the  data  redundancy  among  the  record  fields.  Furthermore,  if  a 
record  were  ever  to  be  damaged  or  destroyed  the  audit  trail  data  for  that  date  would  be 
lost,  resulting  in  an  inaccurate  historical  account  of  inventory  items.  That  is  the  reason 
why  multiple  entities  should  not  be  routinely  merged  into  a  single  table  to  reduce 
redundancy  when  designing  a  database  for  a  particular  system. 

2.  Cost  Effectiveness 


Businesses  today  arc  constantly  in  search  of  managerial  tools  and 
manufacturing  procedures  that  reduce  overhead  and  still  maintain  product  reliability. 
The  C.S.  Navy  is  no  different.  There  are  two  specific  areas  in  CD-ROM  projects  such 
as  TLOCD  where  costs  could  be  trimmed.  The  first  such  area  deals  with  indexing.  The 


total  cost  for  preparing  and  creating  the  TLOCD  indexes  exceeded  S9.OO0  (find.  19S0. 
p.  i  he  Navy  may  benefit  from  providing  its  own  indexing  and  utilizing  $9.<XH.l  in 

cost  savings  elsewhere.  Any  Navy  facility  with  sufficient  computer  hardware  can  create 
the  indexes  required  for  CD-ROM  manufacturing.  In  fact,  there  arc  hardware  and 
ox  re  amts  now  available  that  can  perform  all  stages  of  CD-ROM  production 
•  hr. ■  ugh  the  premastering  stage.  The  CD  Publisher"  from  Video Tools  is  one  such 
pr'N.uet.  However,  it  would  be  a  simple  task  to  assign  the  job  of  indm-Nm  to  a  mini- 
e  w.puter  which  could  grind  out  the  results  in  batch  mode.  The  main  concern  would  be 
in  deciding  the  type  of  index  structure  to  use  for  the  particular  application  in  order  to 
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fiierd'ore.  some  knowledge  of  CD-ROM  indexing  would  be 


The  second  area  in  which  costs  could  he  trimmed  involves  application 
'Oitware.  The  TLOCD  application  specific  software  was  created  at  a  cost  of  about 
sd.ftiu.  Qualified  Navy  personnel  can  create  programs  to  access  the  TLOCD  database 
using  the  library  of  C  language  functions  already  resident  in  the  Ayr  Record  Manager. 
Programmers  having  experience  in  a  high  level  language  should  be  able  to  develop 
sufficient  C  programming  skills  within  a  short  time  and  then  produce  programs  Ibr 
TLOCD  and  other  naval  applications.  Granted,  it  is  necessary  to  purchase  software 
such  as  the  Key  Record  Manager  to  interface  with  the  CD-ROM  file  management 
system  or  else  write  an  independent  interface.  However,  that  might  not  seem  very 
prudent  since  f.  time  and  cost  to  develop  and  debug  such  an  interface  would  certainly 
prove  more  costly  than  an  already  proven  product  such  as  Key  Record  Manager  which 
has  been  sold  commercially  for  under  $2ut>.  furthermore,  such  a  task  would  require  a 
great  deal  of  systems  programming  in  a  language  such  as  C  at  a  time  when  DoD  has 
declared  ADA  to  be  the  primary  language  to  be  utilized  in  future  military  projects. 
Since  most  CD-ROM  access  software  on  the  market  today  is  C-language  oriented,  the 
Navy  should  direct  research  toward  developing  ADA  programs  to  drive  CD-ROM 
applications.  There  are  indications  from  the  CD-ROM  industry  that  ADA  interfaces 
will  be  available  on  the  consumer  market  within  a  few  months.  An  alternative  to  this 
approach  would  be  an  interface  written  to  accommodate  any  compiled  code 
recognizable  in  the  operating  system  extensions,  therefore  allowing  several  dilferent 
compiled  languages  to  access  it. 


C.  TLOCD  PROTOTYPE  IMPROVEMENT 
1.  Proposed  System  Modification 

As  stated  previously,  the  current  TLOCD  system  accesses  and  searches  three 
distinct  databases  in  order  to  obtain  transaction,  closing  balance,  and  audit  trail 
information  for  inventory  item  inquiries.  The  system  should  be  modified  by  extracting 
the  redundant  data  from  the  databases  without  destroying  the  separate  entitle-;  or 
relation--  among  the  three  file  tv  pcs.  I  his  could  be  accomplished  by  restructuring  the 
i Lev  Duplicate  data  would  be  removed  Irom  the  three  files  and  placed  in  a  separate 
•aide  or  "NUN  hie"  which  is  then  linked  to  the  other  tables  via  multiple  pointers  from 
the  N1IN  table  or  via  a  chaining  mechanism  from  one  table  to  the  next.  Although  the 
number  of  tables  is  now  increased  by  one.  such  an  arrangement  does  not  imply 
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i:ic:i  wiencv.  The  data  storage  capacity  is  increased  and  the  tabies  remain  in  as  separate 
entities  to  be  used  tor  other  purposes.  This  new  structure  would  provide  three  i  LOCI) 

; lies  without  duplicate  data  in  such  a  way  that  the  separate  entities  associated  with  the 
TLOCD  files  each  have  attributes  that  apply  to  that  particular  entity.  Therefore,  the 
storage  requirement  is  reduced  without  removing  the  idea  of  separate  entities— which  is 
a  requirement  for  TLOCD  system  control. 

2.  Functional  Design  Issues 

In  designing  a  system  such  as  TLOCD  there  are  three  issues  of  primary 
concern:  database  access,  data  search,  and  data  retrieval.  These  criteria  will  now  be 
discussed  in  relationship  with  the  proposed  TLOCD  modifications. 

Accessing  the  TLOCD  database  involves  locating  and  "opening"  its  index  and 
data  files.  The  access  function  must  search  the  CD-ROM  database  director'  lor  the 
database  name  provided  by  tiie  user  or  the  user  s  program.  The  address  of  a  1  ;ie 
Control  Block  iFCB)  is  acquired  from  the  database  director.'.  The  FCB  will  contain  a 
pointer  to  a  list  of  the  key  record  indexes  used  for  searching  the  database.  It  also  will 
contain  a  pointer  to  the  beginning  address  of  the  actual  data  on  the  CD-ROM.  1  his 
"double-pointer"  configuration  allows  the  system  to  search  a  specified  index  for  a  key 
record  value  and  acquire  the  relative  address  of  the  record  within  the  data  file.  The 
pointer  within  the  data  file  is  then  utilized  to  locate  the  record.  In  this  way  the  integrity 
of  the  pointers  can  be  maintained  and  subsequent  searches  can  be  conducted  relative  to 
the  current  pointer  positions.  Such  an  access  function  requires  two  parameters-the 
database  name  as  an  input  parameter  and  the  database  address  as  an  output 
parameter. 

The  primary  objective  of  the  TLOCD  system  is  to  obtain  historical  data  about 
a  particular  NUN  for  a  specified  date.  Therefore,  the  most  important  fields  within  the 
data  records  are  the  NUN  and  date  fields.  The  NUN  is  used  to  generate  a  key  record 
index.  The  date  Held  is  not  used  as  an  index  generator.  It  would  not  provide  a 
practical  key  record  index  since  there  could  be  possibly  hundreds  or  thousands  of 
transactions  conducted  on  that  particular  date.  Other  fields  that  would  generate 
adequate  key  record  indexes  include  the  National  Stock  Number  (NSN)  and  the 
product  noun  name.  However,  since  the  TLOCD  system  users  deal  primarily  with  the 
NIIN  and  seldom  have  the  need  for  additional  identifiers,  no  other  key  indexes  would 
he  utilized  on  a  regular  basis. 


*  r\  w 


Normally,  indexes  arc  numbered  sequentially  and  the  user  is  queried  as  to 
winch  index  he  desires  to  search.  I Iovvever.  since  only  the  NI1N  index  is  to  be  created 
for  the  TLOCD  system  modification,  no  query  is  needed  and  the  N1IN  index  is 
selected  by  default.  The  user  is  prompted  to  enter  the  NUN  and  the  date  if  it  is  known 
or  desired.  The  N1IN  is  located  in  the  index  via  a  balanced  tree  search.  A  pointer  is 
then  followed  to  a  list  of  date  records  containing  the  dates  on  which  the  NTIN  was 
transacted  and  the  oifsets  of  their  associated  NUN  records  within  the  Hie.  The  dates 
are  listed  in  ascending  numerical  order  according  to  their  Julian  equivalents.  The 
NUN  record  olTset  is  retrieved,  record  address  computed,  and  the  pointer  is  moved  to 
the  desired  record  of  the  NTIN  tile.  Input  parameters  for  such  a  search  function 
include:  (1)  the  database  address.  (2)  the  index  to  be  searched,  (3)  the  NTIN.  and  (4) 
the  date.  The  function  will  return  the  record  offset  in  relation  to  the  NTIN  file  origin. 
If  no  date  is  specified,  the  function  wiil  return  the  oifset  tor  the  earliest  recorded 
transaction  for  the  specified  NTIN.  See  Tigure  4.3  for  an  illustrative  example. 

Once  the  record  is  located  in  the  data  tile  its  contents  must  be  retrieved  and 
displaved  for  the  user.  There  are  various  methods  that  can  be  used  to  achieve  the  task. 


One  such  method  involves  the  use  of  a  function  similar  to  the  "scan"  function  found  in 
the  C  programming  language.  In  such  a  technique,  the  record  is  treated  as  a  string  of 
bytes  and  the  string  is  "scanned"  or  read  into  a  butler.  The  contents  of  the  butler  are 
then  displayed  on  the  screen.  In  order  to  make  any  sense  of  the  data,  other  functions 
must  be  called  upon  to  format  the  record  string  into  a  readable  medium.  The  record 
size  must  be  known  so  the  scan  function  can  determine  how  many  bytes  to  transfer 
into  the  butler.  This  poses  no  problem  for  the  TLOCD  system  since  its  records  arc  of 
fixed  length.  However,  for  variable  length  records,  the  scan  function  would  have  to  be 
designed  to  look  lor  a  length  field  at  the  beginning  of  each  rccord-or  else  receive  the 
information  from  the  search  function.  Data  retrieval  can  be  similarly  executed  by  string 
manipulation  functions  commonly  found  in  such  programming  languages  as  Pascal  and 
ADA.  Retrieval  programs  written  in  C  warrant  more  consideration  due  to  the 
language’s  powerful  screen  formatting  functions. 

3.  Ollier  Issues 

No  system  design  can  afford  to  ignore  the  needs  and  desires  of  ns  user 
environment.  Systems  that  are  not  user  friendly  seldom  make  an  impact  in  the  market 
place.  Such  essential  TLOCD  user  response  has  indicated  dissatisfaction  with  the 
'page  up”  and  "page  down"  functions  that  permit  them  to  move  forward  or  backward 
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Figure  9.3  Search  for  Specific  NUN. 

within  the  data  tile  only  one  record  at  a  time.  They  would  benefit  from  a  scroll 
function  which  would  allow  them  to  move  forward  or  backward  within  the  file  any 
number  of  records.  Such  a  function  would  not  be  hard  to  implement  and  would  add 
1 1  c ; b i  1 1 1 y  lor  users.  I  he  user  would  proside  an  integer  (positive  or  negative)  input  lor 
the  number  of  records  he  wishes  to  seroll  over.  Since  the  records  are  of  fixed  length, 
such  a  ['unction  could  readily  compute  the  new  position  of  the  record  in  the  data  lile 
and  then  reposition  the  pointer  to  that  location.  The  function  would  require  three 
input  parameters:  (1)  current  pointer  position.  (2)  record  length,  and  (3)  number  of 
records  to  scroll.  It  would  pass  the  new  record  location  as  an  output  parameter.  An 
attempt  to  scroll  past  the  beginning  or  end  of  the  data  lile  would  result  in  retrieval  of 
the  first  or  last  record  in  the  file. 


(o 


cummer  wme  to  me  concerned  uuh  is  me  arrangement 


wa.e  o.i  un. 


'.rceii.  !..e  current  1  I.lX.'D  screen  interface  displays  a  transaction  record  lor  a  spcv,fu 
^ !  i N  and  then  queries  the  user  as  to  whether  he  wants  to  view  a  closing  balance  or 
audit  trail  record  for  tire  NUN.  Therefore,  the  user  is  aware  that  he  must  deal  with 
three  serai  ate  croups  of  tiles.  The  user  Iras  no  need  to  know  such  information  and  the 
v  'tern  should  make  :.t  transparent  to  him.  Furthermore,  the  screen  interlace  should 
display  data  from  across  ail  three  TLOCD  relations  upon  each  NUN  inquiry.  The 
resuit  would  be  a  fuller  screen  with  multiple  records  being  used  to  provide  transaction, 
■.lasing  balance,  and  audit  trail  data  about  the  NUN.  The  need  no  longer  exists  to 
prompt  the  user  alter  each  N1IN  search  to  query  the  user  about  closing  balance  or 

!  he  design  of  a  user-friendly  au  mme  to  a  system  is  a  complex  one  and  goes 
' nd  the  mare  c.  this  thesis.  The  ubo'-e  examples  serve  to  illustrate  that  these  issues 
must  be  carefully  jnaiyvted  to  provide  user  satisfaction. 
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X.  CONCLUSIONS  AND  RECOMMENDATIONS 


I  he  L.S.  Navy  is  constantly  exploring,  experimenting,  and  seeking  new 
technologies  in  order  to  maintain  a  tactical  advantage  over  its  adversaries.  CD-ROM 
technology  warrants  immediate  attention  and  funding  for  implementation  and 
applications  development. 

CD-ROM  applications  provide  a  potentially  valuable  commodity  to  the  L.S. 
Navy  at  shore  facilities  and  on  board  ships  at  sea.  The  product  is  already  proven  and 
the  financial  risks  are  minimal.  Major  shore  facilities  should  proceed  and  adopt  plans 
to  convert  their  ’permanent  and  archival  databases  to  CD-ROM  applications  such 
the  I  LOCD  system.  The  technology  is  available  and  is  already  starting  to  earn  a 
significant  ruche  in  the  electronic  data  processing  industry.  Although  an 
implementation  reflecting  the  proposed  TLOCD  modifications  presented  in  the 
previous  chapter  cannot  be  carried  out  within  the  scope  and  time  frame  of  this  thesis,  it 
can  be  determined  from  the  information  presented  that  such  an  implementation  is 
plausible  and  doable  within  L’.S.  Navy  environments. 

CD-ROM  is  the  catalyst  that  will  eventually  lead  to  the  first  paperless  ship.  Its 
use  in  conjunction  with  other  developing  electronic  technology  such  as  WORM  makes 
tl'.e  goal  reachable.  The  Navy  should  designate  a  ship  to  function  as  a  prototype  for 
CD-ROM  conversion.  The  prototype  must  apply  sound  database  design  principles 
such  as  those  emphasized  in  this  study  in  order  to  produce  eHicicnt  and  elfective 
performance.  It  must  also  address  the  functionality  of  the  user  interfaces  designed  for 
each  specific  application  on  an  independent  basis.  If  these  guidelines  are  followed,  the 
CD-ROM  applications  will  produce  immediate  cost  savings  and  increase  ellieiency  and 
operational  readiness  by  providing  Ulster  access  to  critical  data.  If  current  research  and 
development  cannot  economically  produce  a  feasible  optical  storage  solution  (such  as 
WORM  or  erasable  discs)  lor  constantly  changing  data,  then  the  chances  for  a 
"p  ; peri-css”  ship  in  the  near  future  are  greatly  reduced.  Regaidless  of  that  outcome. 

<  D-ROM  will  remain  reliable  and  cost-eilective  for  shipboard  use  providing  proper 
an.;!;. --is  is  conducted  prior  to  swem  integration. 
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